Hello guys,
I have created the free app PiccyBot that speaks out the description of the photo/image you give it. And you can then ask detailed questions about it.
I have adjusted the app to make it as low vision friendly as I could, but I would love to receive feedback on how to improve it further!
The App Store link can be found here:
https://apps.apple.com/us/app/piccybot/id6476859317
I am really hoping it will be of use to some. I have earlier created the app 'Talking Goggles' which was well received by the low vision community, but PiccyBot is a lot more powerful and hopefully useful!
Thanks and best regards,
Martijn van der Spek
Comments
GPT5
Blindpk, I have added GPT5 to the model list as well. But be warned, it is really slow at the moment, possibly be the first day traffic.. For practical use, picking the nano model at the moment probably makes more sense for now.
My initial thoughts. How to copy paste images from Facebook
So I have tried out the copy-paste feature from Facebook, and it works pretty well. The only caveat is that whenever I allow the option, if I want to copy and paste a second image, I have to click the allow option again. Is there a way to optimize this feature so that I give the permission once and don’t have to repeat it?
Regarding the GPT-5 model and description, yes, I have to say that it's quite snappy and efficient. I do notice that the initial descriptions are shorter—initially it gives me a summary of what’s in the pictures and video content, but if I send out a message, I can ask for more specific, detailed descriptions. I don’t know if you’re able to make it so we can get the long description without having to prompt for it. Right now, the model response faster, but, yeah, the description is shorter on the first attempt. It's perfectly fine the way it is though.
I don’t know if I mentioned this in my last feedback, but it would be nice to have the magic tap to stop and start the description. Sometimes, when I’m dictating a message into the box while asking a follow up question, if I double tap, the voice starts speaking out the description. I’m not sure if this is something you can change so that I can dictate without the speech starting again, or maybe I should be pausing it. Either way, having the magic tap to stop and start the description and ensuring that when I’m dictating into the text box, the voice isn’t speaking when I tap, would help.
These are some wonderful changes—big improvement. Very proud of you, please continue to keep up the good work. I appreciate this app so much, so thank you for dedicating yourself and putting in the hard work to ensure this product is the best it can be.
I want to clarify some confusion here. I think where a lot of people might have an issue is when copying a picture from Facebook. I was initially looking in the share sheet for the option, where you have the option to share the link, post to your story, or share to other profiles, but that’s not where the copy option is located. It’s actually in the same section where you have the option to save a photo to your device. When you come across a picture on Facebook that has alt text—usually the automatically generated text by Facebook—it might look something like “Photo, may be an image of dog and grass.” When you hear this, you’re going to double tap on it, assuming you’re using VoiceOver, and it will bring up a “More” option within that section. When you double tap “More,” it will show you an option that says “Copy Photo.” Double tap that, and it will copy the photo to your clipboard. Be careful not to copy anything else until you get the description. When you switch over to PiccyBot, go into the text box where it says something like “What’s in this photo?” or “What’s in this video?” Double tap and hold, and a box will pop up asking if you want to allow pasting from Facebook. Double tap “Allow,” and it will automatically send off that image so you can receive the description.
Re: GPT-5
Thanks a lot! Yes, GPT-5 is pretty much unusable right now with the processing time taking so long, GPT-5 Mini seems to work well though.
I found the copy image option in Facebook at last. It is really weird, I knew it used to be there but earlier today it didn't show up at all for some reason. Anyway, the copy/paste feature works well. I agree with Winter Roses though, is it possible to only have to give permission once?
GPT-5 Chat
There is a model in the API called GPT-5 Chat, which is the same version of GPT-5 powering ChatGPT. When I test it it responds MUCH faster than standard GPT-5, however with rather short descriptions, but might be something to look into and add instead of the standard one. Here is the page about it:
https://platform.openai.com/docs/models/gpt-5-chat-latest
A Question About Model Storage of Sent Pictures
Hi Martijn,
First, thank you for all of the work you have done and continue to do for our community with PiccyBot. I never thought that such a service would exist, especially having access to multiple models.
I do have one question about how the AI models use the data we share through PiccyBot. I know that you/PiccyBot do not store or save the pictures uploaded by users; I am wondering, however, if you have any information about what the various AI companies do with the pictures that PiccyBot sends? I enjoy comparing descriptions from the various models, but I am uncomfortable having described pictures of family/friends/anyone else if these services are storing/utilizing the pictures users send. Regardless of what the AI companies do, please understand that this is not a reflection on PiccyBot or the work that you have done for our community.
Thanks for any insight!
Re: Grok 4 will stay
I live in a region where cellular connection is more reliable than Wi-Fi but don't think this has anything to do with connection stability. I can't even get a description when I upload a photo. I just get a Retry button but the result doesn't change no matter how many times I retry. Other apps like Be My Eyes can provide image descriptions without any problems though.
Storage of images and connectivity
Enes, if you feel PiccyBot is 'stuck', while network is fine, please either restart your phone or even reinstall the app. It should work again. It's an elusive issue that I will try to fix the coming time.
Michael, as said, PiccyBot doesn't store any media or prompts. And there is an additional layer of privacy since all requests to the providers come from the PiccyBot address, not yours. However, the AI providers can use your data in some cases. OpenAI says they won't, but you never know. Anthropic (Claude) has quite a good reputation and Mistral being European is very privacy conscious. Safest is Llama 4, since that is running on a local server and all data is removed immediately after use. The worst is likely Google. But hard to avoid them, especially with the Gemini 3 model around the corner of which I have high expectations.
Re: pasting
Firstly, thank you so much for this new feature - I have been wanting an easy way to get facebook images described for ages.
I think I am being thick though as I can't find the option to paste.
In Facebook, I go to an image, double tap to view it, then double tap and hold for a bit to get the menu, and then I select Copy image. I then switch to PiccyBot... but where is paste? I presume I am repeating the same action as per Facebook - double tap and holding. But I can't find the option to paste. What should have the focus when I do this? I've tried the text box, heading and some of the buttons.
Sorry I know I'm always the last one to figure these things out. I think I am on the latest version - there was an update pending so I installed it before trying.
Purchased the premium features but cannot pick AI model
I can't interact with the AI model selection dropdown in the settings. Double-tapping does nothing. Also, there's this button labeled as "gear.badge.questionmark" that should be labeled more properly, likely "Help".
AI model selection
Enes, you somehow cannot access the Firebase database with the PiccyBot settings. Can you use a VPN or other network and try again?
Actually I have included a built in offline list, but that backup feature wasn't included in the latest release. I'll provide an update by Monday.
Mr Grieves, you can paste in the main view in PiccyBot with a long press. Press in the middle of the screen. It will then prompt 'PiccyBot would like to paste from Facebook', 'Do you want to allow this?', and then you can select 'Allow paste'.
Data storage
Checking the API terms of the different companies they say basically the same thing. All of them store your data for a limited time to check if it complies with their usage policies. How long this time is varies (and some are vague about it). AFAIK the Piccybot server is in the EU which means the companies have to follow GDPR, but you of course really never know. I'm not sure that any of the big companies are "better" or "worse" than others in that regard.
Using a VPN worked but...
I could access the model list and select Grok 4 to find out how it would describe images and videos but then I forgot that it wouldn't be able to describe videos and captured a video. I did get a description afterwards, but the video was probably described by GPT4 or whatever the default/free model is. And now when I open the app, I have the free version interface with the Subscription button and an ad on the screen. I may try restarting the device or reinstalling the app but just wanted to inform you in case you work on fixing such issues. Also, I'd love to know whether I will have to keep the VPN on even after selecting the AI model, to access the server and retrieve descriptions at all times, or only once. Another thing is, why don't we have DeepSeek, Qwen or other models among the available ones? What models do provide video descriptions if not Grok 4? Can I not select any other model apart from GPT4 if I want to be able to get video descriptions as well as image descriptions? And can I not customize the default/initial system prompt? This would be quite handy. It's actually somewhat strange that this feature is missing when we can even customize the personality of the voice and possibly the description as well. Or does the personality customization thing apply to the style and intonation of the voice only rather than the content of the description? Finally, adding Piper voices as an option might create a free option and help reduce costs for you. They're also neural voices even though they lack style customization. They're also open-source and can be deployed on any server. Wait, why not just use the system voice then?
* Update: I did uninstall and reinstall the app while writing this, but now the Restore Purchases button doesn't bring up the App Store screen to let me restore the purchase. Let me also add that the Turkish localization is incomplete.
* Update 2: Just disabled the VPN and finally got the premium screen back after double-tapping on the Restore Purchase button several times.
Another Question
It appears that my model configuration is stored on the server, not the device itself. I completely uninstalled and reinstalled the app as I mentioned above, and the other settings were reset to the defaults, but it was still Grok 4 that was selected as the AI model in use.
And comes the question: What is Piccybot Mix and how is it supposed to work?
And here's a suggestion: Can we not set the description to match the language of the content if it is in any of the languages we specify in the settings? This could be useful for bilingual/multilingual people and those learning foreign languages etc..
Update (more questions): What is "Blind native Style"? Is it a model? How exactly does the length parameter work? Does the number let you set the number of words per response? If so, should the description length depend on the content itself to a certain degree? What if we prompt PiccyBot to describe a long video? Will it still stick to the same description length and truncate the response?
Answers..
Thanks for the feedback!
The available models vary from time to time. PiccyBot had DeepSeek, but I replaced it with Llama4 as that was a similar open source model and I want to keep the list manageable. I also removed GPT4o mini recently, as we now have the GPT5 models.
Regarding the video descriptions, only the Gemini, Amazon and Reka models do that. The other models are image only. PiccyBot will default to Gemini Flash 2.5 for a video description when a diffrent main model has been selected.
The personality affects the tone of the voice and will have some adjustments in the style of the content. Turn it off for a clean description. I will likely add a few more voice options the coming week. For the system voice, you can can set the voice to 'None'.
I hope the network and VPN issues will improve, I will add more local settings and backup options to ensure the settings remain accessible even if the network cannot connect to the Firebase server or the PiccyBot server.
PiccyBot Mix uses a combination of descriptions given by OpenAI, Google and Mistral models, and uses only the elements that are common to all. This should in principle all but avoid any hallucination in the description. So use this model for the most accurate description. Image only.
Blind Native style uses an inbuilt prompt to ensure the description is relevant for people born blind, with more focus on touch and no reference to colors etc.
The length parameter basically determines the number of tokens used with the model. Set to 100, it will result in more lengthy descriptions, while 10 will give a concise description. The response speed will be slower with a higher length setting. For a long video, set length to 100 for the maximum detail in the description.
The video quality setting determines the amount of compression of the video when sending it to the server. Low is high compression (for free users) while high is no compression. Setting it to high will give more exact results at a cost of slower processing.
Hope this helps!
Re: pasting
I'm not sure how that works with VoiceOver. I don't really have a "main form" that I can give focus to as far as I know. I can select all the elements in it, but not the form itself.
I have managed to get it to work a couple of times but I think it was pure luck.
Has anyone managed to do this with VoiceOver?
Re: pasting
Mr Grieves, It should be double tap and hold. But you are not the only one having trouble getting it to work. I will try to make it automatic in the next update. So, if PiccyBot finds you have an image on your clipboard, it will prompt you with a question whether you wish to paste it.
However, Apple is tricky with this, as they want only user initiated actions, not automatic ones, so they may not approve. Let's see..
Copy paste solution, maybe?
I use VoiceOver, and yes, I have gotten the copy-paste feature on Facebook to work with the app. I will say though, it can be tricky, because from what I understand, you have to be positioned right at the start of the line in the text box for it to work, and it has to be done pretty precisely. Isn’t there a way this could be part of the rotor?
You know how, when using the phone, there’s usually a box or menu with edit options on the rotor that include “Select,” “Select All,” “Copy,” “Paste,” “Share,” and other relevant commands? Is there a way you could enable something similar so that, for example, if I have an image on the clipboard, I could go into the text box manually, switch to the rotor, go to “Edit,” and then double-tap the “Paste” option? This would essentially put the picture into the box, just like how it works on the iPhone directly.
Right now, if I copy a picture directly to my clipboard from my iPhone camera, I can paste that image into the Notes app without a problem, but it doesn’t seem to work anywhere else. I don’t know if this could be implemented here, but it’s an option worth looking into.
As it stands, the edit menu is there, but none of the options show up. It would also be good to have some kind of text representation to show that there’s content in the box after pasting the image from the clipboard. Maybe it could display something like “Image” or even a short code such as JPG, GIF, PNG, or a series of numbers and letters. Basically, anything that would give an indication that there’s media content processing in the app. Having a completely blank box with no indication feels a bit strange, because there’s no way to know that there’s actually media there if you can’t see it.
Re: pasting
Thanks for the reply. I think the problem with VoiceOver is that it needs a specific child element to interact with and if it needs to be done on the background container then it becomes a bit tricky.
I wonder if popping up when an image is detected could get annoying. If I am using my phone I don't typically do much copy/pasting unless I am also on my Mac. So if I have an image in clipboard, it's likely to stay there for a long time. So if PiccyBot prompts me every time, then I would need to try to find some text to copy just to stop it happening?
Is there enough space on the screen to add a paste button amongst the other buttons, but only display it if there is an image to paste? Or maybe do something with the rotor actions?
Anyway, thanks again for this - once it becomes a bit easier this is going to be another really big advantage of PiccyBot compared to everything else. I usually ignore Facebook as I just feel excluded and I'm too lazy to save files all round the place just to have them described. I've been wanting something like this for ages.
Copying Descriptions to the Clipboard
Hi Martijn,
Could you please implement an easy way to copy just the image description to the clipboard? Right now, this is accomplished by pressing the Share button and copying the described text to the clipboard. Once I get where I want to paste the description (usually into a message to someone), I have to do some editing to remove the link to PiccyBot. Would it be possible to add a "Copy" function to the VoiceOver Rotor when focus is on the text containing the description, and to please remove the PiccyBot App link from what is coppied?
Re: Answers..
Thanks...
Question 1: Does the "None" option not disable the voice entirely? How does that let you use the system voice unless you have VoiceOver or Speak Screen enabled and use the appropriate gestures to hear the description? What I mean by "system voice", however, is the ability to have the system voice read out the description even if no such feature is enabled.
Question 2: What LLM does PiccyBot Mix use to perform text generation and generate the response? What do you mean by "elements"? Does MiccyBot Mix compare the text responses provided by the different models you mentioned or does it compare the raw image processing results and then finds the elements commonly found in all of them and then generate the text response itself?
I don't want to be that person, but...
So the video description is done by Gemini? Why did I think it was a variant of ChatGPT? Wasn’t this a thing once, or did it change? I personally prefer the descriptions from ChatGPT, so I’m wondering if that could be done for video descriptions too.
Also, would it be possible—without making it overly complicated—to have a PiccyBot mix for videos as well? Basically, the idea would be to run the video through different models, then compile the details that at least three or more of them agree on. I know that’s probably super complex, and you’d have to figure out how to merge everything, but I think it could be really useful.
Enes, I'll look into system…
Enes, I'll look into system voices as part of the addition of new voices, soon. PiccyBot Mix currently uses Gemini 2.5 Flash, GPT 4.1 and Pixtral as models, and Gemini 2.0 Flash takes input from all elements that are present in each of these descriptions to put together a combined description. These exact models were chosen for their speed to be able to generate an accurate description but not take too long either. I may change them now with the introduction of GPT5 and soon Gemini 3.
Winter, a mix for videos can be done as well, let me experiment with that. But OpenAI doesn't do video descriptions unfortunately. The only thing it can do is take screenshots at intervals and describe these separately, but that takes long and the result is not too great anyhow.
An iOS update is under review by Apple currently. That implements the automatic paste in PiccyBot, which should help make it a far easier feature to use. The app should also be more stable.
Thanks all!
Enes, I'll look into system…
Enes, I'll look into system voices as part of the addition of new voices, soon. PiccyBot Mix currently uses Gemini 2.5 Flash, GPT 4.1 and Pixtral as models, and Gemini 2.0 Flash takes input from all elements that are present in each of these descriptions to put together a combined description. These exact models were chosen for their speed to be able to generate an accurate description but not take too long either. I may change them now with the introduction of GPT5 and soon Gemini 3.
Winter, a mix for videos can be done as well, let me experiment with that. But OpenAI doesn't do video descriptions unfortunately. The only thing it can do is take screenshots at intervals and describe these separately, but that takes long and the result is not too great anyhow.
An iOS update is under review by Apple currently. That implements the automatic paste in PiccyBot, which should help make it a far easier feature to use. The app should also be more stable.
Thanks all!
Suggestion: Adding our own a-P-I key
Just a suggestion request / throwing it out there.
I really want longer descriptions on videos. Even though I have it set to 100 it's still fairly short, at least textually. I was wondering for those who want longer more detailed descriptions, so you don't have as much server cost, if we could maybe add our own A-P-I keys? I think that would be in some ways more cost effective for you, and give those who want really long descriptions such as myself, another option to get those.
This would of course be an option, I'm not suggesting pivoting away from the way things are done at all, just an optional extra.
Longer video descriptions, and other thoughts
So I don’t know if this could work, but I had given you a list of suggestions at one point in the Facebook group, and you were like, “Yeah, these could totally work, but you have to keep it affordable.” I’m going to copy and paste some of that discussion again myself because I like having all my points in one place. Would any of these suggestions work here? I know you said they were feasible, but it really depends on the cost, and that’s the one aspect that might hold us back. I’m going to paste the questions again just to refresh my mind and compare everything side by side. Could any of these work? For example, could you have Gemini analyze the video but then have ChatGPT give the output, or would that be too complicated? Let me explain. The thing is, the description—even though I set it to 100—still ends up being very short when it comes to video descriptions. That’s why I thought ChatGPT was the one giving the descriptions at one point, because I know ChatGPT tends to give out longer text in comparison to Gemini. If I remember correctly, it’s probably the model that gives the longest descriptions overall. Sometimes I’ll get a nice, long description, and then other times I won’t. Even when I specifically prompt for a longer video description, Gemini tends to summarize the content instead, which I think it’s kind of known for. So that’s why I thought ChatGPT was the one providing the description, not Gemini.
Basically, I don’t know if it’s even possible to combine the models—have Gemini analyze the video, then have ChatGPT provide the actual text. Maybe that doesn’t make sense technically, but it would be nice to somehow get longer, more detailed descriptions for videos. Also, when I ask follow-up questions about the video, even when I’m being very specific, I never seem to get the kind of response I actually want or need. I’m not sure if I’m doing something wrong here, or if it’s a limitation of the model. Like, I can analyze a video and get a fairly long description on the very first attempt. But if I analyze that same video again, I never seem to be able to get that same long, satisfactory description. I don’t know why, but artificial intelligence is kind of famous for this—you know, where the very first result you get is often the best one, and everything after that usually pales in comparison, sadly. I know you said ChatGPT doesn’t do video descriptions—which is strange—but what about Claude? Could you do a mix between Claude and Gemini? You’re still working on the audio-description model, right? I haven’t seen much discussion about it on here lately, so that’s why I was wondering. Anyway, I’m excited to see where this goes—you’re doing great.
Right now, the app has quite a few models, which is great, but I was wondering if maybe the number of models could be cut down a bit. For example, ChatGPT might be great with backgrounds and emotional details, Claude might excel at emotions, and Gemini might be better with faces. So what if you could combine the best parts of each model into one, or maybe two or three models? What you could do is either combine the best parts of each model across all the companies into one unified model that brings together their strengths, or if that’s not feasible, you could combine the best parts of each model within each individual company and then offer one or two optimized models per company in the app. In other words, there are two main ways to approach this: One, create a unit of cross-company models that merge the strengths and compensate for the weaknesses of all available models from different providers. This would mean blending the best features from ChatGPT, Claude, Gemini, and so forth into one or two highly capable “super models.”
Two, if that’s technically too complex or not possible, then at least streamline by merging the best features of models within each company. For example, take the best elements from the various ChatGPT models and combine those into one or two main ChatGPT options; do the same for Claude, Gemini, etc. Then the app would offer a couple of optimized models per company, instead of overwhelming users with a long list of individual models. Also, you could improve the naming scheme. Instead of showing confusing model identifiers like “ChatGPT-4 mini” or “Gemini Flash 2.5 NIL,” you could show the company name plus a simple model name and a short description. For example: we would see clean, descriptive names such as:
• ChatGPT plus – Best for technical images and detailed descriptions
• Gemini Flash – Best for dynamic images with emotions and faces
• Claude Sonnet – Best for videos or images with text and graphic overlays
This would help users easily understand the strengths of each model without having to guess which version or identifier means what. Keep in mind that this is a general overview, but we'd still be able to use the models for whatever we wanted to have described. Think about it—the models you have in the app are pretty similar, and most people usually stick to their one or two favorite models. I don’t think a lot of users are switching between models as much as possible. What I’m saying is, you could keep access to all the baseline models behind the scenes but not give users direct access to each one. All the processing could happen in the background, so users wouldn’t have to pick or switch models directly. You could even pull from other AI sources that don’t have chat interfaces or other AI developers who have strong description models but no conversational UI. We would get access to those capabilities through the underlying description process. Any missing details or aspects that don’t come through in the initial description could be fixed with follow-up questions. I don’t think this would sacrifice the quality of the descriptions. You could also consider having the app pull from the two or three latest models from each company, but that might complicate things and risk losing some details early on. Each model keeps improving anyway, and honestly, many of them are pretty similar with only minor differences. I don’t think most users can tell the subtle distinctions between, say, one version of Gemini versus another, or each version of Claude or ChatGPT. Usually, models fit into a couple of broad styles, and the letters and numbers in their names don’t mean much to the average user. Kind of like what you did with the PiccyBot mix or the native blind style, but on a larger scale.
PiccyBot Mix
Don't know how fast it is, but Llama provides surprisingly accurate descriptions despite being open-source, even though it lacks the detail provided by GPT5. It's still more detailed than Grok4, and doesn't hallucinate like Claude4 Sonnet. Reka, Pixtral and Amazon Nova fail to provide sufficiently detailed and accurate descriptions. So I'd say it's Llama instead of Pixtral that should be the third model beside a certain version of Gemini and GPT, provided it's fast enough. And I will list some other suggestions that I noted down as I continued to use the app. I'll have posted twice in a row but I don't want to make things complicated so the suggestions I will share in a separate post.
Suggestions
- The volume settings for the mixer are not saved permanently; they just revert to the defaults when I close and reopen the app.
- The audio description sounds okay when I hear it within PiccyBot but the beginning is trimmed/truncated when I export the video mixed with the description. Also, the first sentence is heard at the very end so the description starts over from the beginning if it ends before the video does. Some finer adjustments seem necessary to better synchronize the descriptions with the scenes and fit the description in the video duration so that it is not truncated or repeated from the beginning to fill in the gap at the end and align the end of the description with that of the video.
And below are some other suggestions:Grok 4 - what a blessing + some tips to Enes
Martin, thanks much for adding Grok 4 to the model list, and for deciding that it will stay there! It does have a very low censorship level indeed! I have an image set of quite peculiar adult content. It forms quite a good "test set", as I was part of the experiences pictured in them, so I can form quite a good opinion on the descriptions given about them. Grok 4 didn't reject any of them, did a very good initial description, and listened accurately to all my follow-up questions, and clarified all details I wanted to know! Gemini 2.5 Pro, my earlier "go-to" model errored out on all of them without saying a word about reasons of course (which were the adult content)! And what's more all this in my native tongue, Hungarian! The wording, grammar and expressiveness of Grok 4 in Hungarian is totally on par. There were mild description inaccuracies sometimes (not really hallucinations, because the described element was really there, albeit a bit differently). But 1, or at most 2 follow-up questions always cleared those up. Earlier (in autumn of 2024, when I subscribed to lifetime Piccybot) Pixtral was the model to choose for such images. But it didn't understand and generate in Hungarian, so I needed to change to English forsuch content. Furthermore the descriptions of Pixtral weren't as good as Grok 4, though by no means bad. I don't know by the way whether Pixtral has retained that low censorship level in its current iteration. So Grok 4 is truly a blessing as I have the perfect right to "see" adult content, because I am way into my adulthood. And now I can do that with a really high quality and in my native tongue. So thanks Martin again!
To Enes: customizing the "what's in this image / video" prompt (which Piccybot allows lately) and follow-up questions may be the real solution to some of your proposals. Piccybot has recently got a prompt history feature, so you can recall the prompts you use / need often, like "Just read the text in this image accurately!" for OCR purposes (or its Turkish equivalent) etc. As we all know, we humans differ fortunately, so our needs, tastes etc. too, and visual content has an endless variety. And there are multilingual needs / setups for some people, like me. So I don't think that a heap of super-taylored modes / options are the way to go. There always will be a ton of cases that just don't fit. I prefer user freedom of creativity and experimentation more, although it requires some more work on the user's end, but it may well worth it. By the way I am completely satisfied with how the model list looks like right now: I find it neithertoo long, nor in any way uncomfortable or unpalatable.
See post: "Another Question"
I already suggested a solution I found useful to get descriptions in multiple languages. I didn't know Pixtral also provided uncensored descriptions by the way.
prompt storage increase?
Really enjoying all the new changes going into the APP- the copy and paste option is awesome- and if we ever get a desktop/website version it'd be really nice for sure.
BTW, not to take anything away from Piccy, but if you guys have been looking for a desktop uncensored describer option while waiting for Piccy to go that rout, go to the miniapps website and search image describer and you'll find lots of uncensored describers you can use. You have to pay a bit to use them, but for now it's a nice option.
Now to my main request :) Can we have more prompts stored? I want to have a set of them for videos and a set for images, but the allowed number ATM basically is only enough for my image based prompts. Also, is there a way to make stored prompts accessible in the "ask more" section?
Thanks for the great work!
I downloaded the update. I have some questions
I downloaded the update a while ago. It’s on the App Store. The copy-paste feature is working pretty well here—whenever I copy a picture to the clipboard, once I enter the app, it prompts me to allow pasting. Hopefully, you can make it so it only asks once to paste instead of every time.
I still think having the copy-paste option on the rotor, if possible, would give us more control if I don’t want it to happen automatically. Because as it stands, like I said, when I paste a picture in the box without any text indicator that the picture is there, if I wanted to ask a specific question about the picture before I start analyzing, how would this work? Once I paste the photo, it automatically sends the query. I don't want to be picky, but maybe this could be optional? Like in the settings, do you want to paste manually, or automatically? Plus, when the picture is pasted in the box, can I ask a follow up question, when this issue is resolved, without overriding the picture? It seems silly, but I thought I would ask, just in case.
Update available - added pasting of video links
The latest update will also paste video links (from Facebook and YouTube mainly), which should make social media video description more flexible.
The volume settings for the mixer are now saved permanently.
Good luck, let me know how it works for you!
Thank you, and a question
Hey, the copy paste feature, with the links, is working very well.
I thought I would ask this question just in case — is there a way to enable a feature so that when I paste a link to an Instagram post, the AI could grab all of the photos from that post and describe them back-to-back?
What I’m thinking is: if I share the link to a post, it could automatically describe each image in sequence. But this could also be optional — so a person could choose to have it describe each picture automatically, or have it wait until they type something like “next picture” before moving on.
For posts with multiple images, maybe it could handle up to five at a time. If there are ten pictures in the post, it would describe the first five back-to-back, then wait for confirmation before describing the second five.
It could also provide a quick summary of each image in the post first, so the user could choose which ones they want more details about, instead of getting full descriptions for all of them by default. You could even make this customizable in the settings — for example, I might choose to always describe the first four pictures automatically, then wait for me to say “next” before going further.
Not sure how helpful this would be for others, but for me, when I’m on Instagram, sometimes a post has multiple pictures and I have to go through them one at a time to get descriptions. This kind of feature would make it much smoother. Just thought I’d throw it out there.
re: batch photos
"is there a way to enable a feature so that when I paste a link to an Instagram post, the AI could grab all of the photos from that post and describe them back-to-back?"
Totally agree- a batch download/describe would be very helpful for me too since I often deal with pages with multiple photos of the product I'm trying to buy and doing them one at a time is very tedius.