Hello guys,
I have created the free app PiccyBot that speaks out the description of the photo/image you give it. And you can then ask detailed questions about it.
I have adjusted the app to make it as low vision friendly as I could, but I would love to receive feedback on how to improve it further!
The App Store link can be found here:
https://apps.apple.com/us/app/piccybot/id6476859317
I am really hoping it will be of use to some. I have earlier created the app 'Talking Goggles' which was well received by the low vision community, but PiccyBot is a lot more powerful and hopefully useful!
Thanks and best regards,
Martijn van der Spek
Comments
great app for video describing
This App is great for having videos described.
Audio description?
Hello Martijn!
I don't know if it would be possible, but would it be possible to describe what is happening at the exact moment when processing the video, like an audio description? Almost like seeing ai, but without the pause to describe.
Of course, it would have to give less information, but I think it would be amazing if it could do that. Reading the text or listening while the video is playing like this ends up causing synchronization.
How to describe a YouTube video?
Hello Martijn!
I noticed that the 2.14 version updated today mentions the ability to describe YouTube videos. This is truly a great upgrade. However, I haven't found a way to use this feature. I can't find PiccyBot in the sharing menu of YouTube videos, and copying and pasting the YouTube video link into the APP doesn't work either. Can you explain how to use this feature?
Many thanks
2.14: what are the available shortcuts?
Hello Martin,
I discovered in the 2.14 what's new that some shortcuts were also introduced. This is a great joy. But please give us some details, what phrases they are exactly.
Sharing YouTube and Shortcuts
Carter, PiccyBot will describe videos that are shared to it. In the case of YouTube, go to share, then select PiccyBot. The first time it will be hidden under 'more..' and then again 'more..' to find it. After that it will show up earlier.
However, there appears to be a glitch in the YouTube app at the moment, likely related to the iOS update, that somehow causes the share function not to work. For some people it has started to work again. So be aware of that please. Using YouTube in Safari works fine.
Laszio, PiccyBot now has shortcuts, but it is an initial release. Please check it out and let me know what to improve or add?
Right now, with the 2.14 update, Siri will recognize these phrases to trigger the camera shortcut:
"Siri, Open PiccyBot camera",
"Siri, Launch PiccyBot camera",
"Siri, Start PiccyBot camera"
Siri will recognize these phrases to trigger the video recorder shortcut:
"Siri, Open PiccyBot video recorder",
"Siri, Launch PiccyBot video recorder",
"Siri, Start PiccyBot video recorder"
However, again there seems to be a glitch, it doesn't work for everyone, likely related to the iOS 18 updates to enhance SIri. So I didn't announce this functionality yet, let's test it a bit further first.
Shortcuts + uncensored model proposition
Hello Martin,
Thanks much for the info. I tried out all the shortcuts you had mentioned. Siri didn't reject any of them, but opened the main interface of Piccybot in reply to all of them instead of the camera or video recorder part. I note that I run an iOS older than 18 on purpose.
Thanks much for the language selection fix in 2.14. Now my native tongue, Hungarian can be found and selected in the language list, and not just by setting language to system language.
While browsing on Huggingface I found the following very promising uncensored model:
https://huggingface.co/huihui-ai/Qwen2.5-VL-7B-Instruct-abliterated
This is the uncensored version of a very recent Qwen 2.5 vision model (the base model is developed by AliBaba group). This is the 7-billion-parameter variant, so computationally it is on par with the Janus Pro model that you currently run on your server. According to descriptions this is a quite versatile and strong model even in this size and multilingual too. It can process images in their native resolution and can be finetuned if needed. It can even process videos (even long ones) if the inference is done accordingly.
Personally I found nothing spectacular or special about the Janus Pro model, and this one is uncensored, so if feasible I propose to run this one besides, or even instead of Janus Pro on your server. Thanks in advance for considering this.
My YouTube description was successful
Hello Martin,
Thank you very much for your answer. I tried it just now and successfully selected PiccyBot from the share menu on YouTube. However, it's strange that when I posted a question to you a couple of days ago, I used the same method but it didn't work. Anyway, it's working now and it's functioning well.
significant improvement on deepseek ai janus
deepseek r1 is wonderful, but my first impression of janus a while ago was oh well. it seems to have improved very significantly, now it gives a comprehensive description of the photo. Martijn must has done some good work about it.
Summary of features
Since over the past year a lot of features were gradually added to PiccyBot, I thought it would be helpful to give a summary of the current features of the app, for both the free and subscribed version:
For all users (Free & Subscribed)
- Convert Photos and Videos to Descriptions β Upload media, and PiccyBot will generate detailed audio descriptions.
- Ask Follow-up Questions β Engage in a conversation with PiccyBot for specific details about the selected media.
- Background Processing with Notifications β Continue using other apps while PiccyBot processes results in the background.
- Language Selection β PiccyBot uses your phoneβs system language for descriptions and instructions.
- Full Localization & VoiceOver Support β Assistive navigation for visually impaired users.
- Social Media Sharing β Share photos and videos directly from apps like Instagram, Messenger,Facebook, Reddit, YouTube, TikTok and X (non-private accounts).
- Dedicated Chat Screen β Chat with PiccyBot for detailed insights about your image or video.
- Siri & Shortcuts Support β Instantly launch the camera or video recorder via Siri commands or a dedicated shortcuts button. 'Siri, open PiccyBot camera' and 'Siri, open PiccyBot video recorder'
- Quick Camera Access with Volume Button β Press the volume button to capture photos directly within the camera.
- Separate Buttons for Media Access β Access the camera, photos, video recorder, or video library directly with dedicated separate buttons.
- Save Descriptions as Metadata β Embed generated descriptions directly into the media file's metadata in the Photos app.
- Video Limits β Free users can process up to 1 minute of video content.
For Subscribed users:
- No ads
- Video Limits for Pro Users β Process videos up to 10 minutes for downloaded, uploaded, or in-app recorded videos.
- YouTube Support:
Videos shorter than 10 minutes are downloaded to your phone and then described.
Videos longer than 10 minutes are described directly without downloading. Fast, but you can't mix the audio afterwards.
- Advanced Settings to customise PiccyBot's output:
Voice Selection β Choose from multiple voices for descriptions.
Personality Mode Switching β Customize the narration style.
Talkback Speed Control β Adjust the pace of audio descriptions.
Model Selection β Select which AI model to use. Each model has its own unique strengths and weaknesses.
Description Length Control β Decide how detailed or brief the descriptions should be.
Video Upload Quality Control β Manage upload resolution for better quality or faster processing.
Process Feedback Sound (On/Off) β Enable or disable sound notifications for processing completion.
Audio-Video Mixing Controls β Adjust video and generated audio volumes independently (e.g., 30% video volume and 90% audio description volume).
- Multiple Sharing Options:
Audio only
Video with optional Audio Mix
Description Only
- Audio-Video Mixing β Combine the original video audio with generated audio descriptions.
- Language Selection β Choose from 55 languages in the PiccyBot settings for descriptions.
In addition to this, there is the PiccyBot Whatsapp service, to which you can send any image, video or website link for an audio description.
Phew, that was it I think! Hope this helps in case you missed or forgot any of the features.
Good luck with PiccyBot, I really appreciate the feedback given in this forum, it has genuinely been a group effort to get to this stage!
Androie bug
Hello guys!
I don't know if I can report bugs in the Android version in this thread. I'm doing this since the developer looks here a lot, so it's easier for us to get support.
If necessary, I'll send it somewhere else.
I'm facing a bug where the description audio isn't being played.
Hi Diego, can you message meβ¦
Hi Diego, can you message me with the details of your device and Android version? All latest PiccyBot features should work on the Android version as well.
Recent Spike in Server Errors
I've been using this App for a couple months now and like it, but have noticed that it very often fails to process Youtube videos, either giving me the generic, "server error," message or just failing to process but then a retry will sometimes work, sometimes not. I do pay for the subscription. I have tried with multiple videos and multiple models. I tried running one of the same links through Gemini Flash to see if it was an issue with the model, but it gave me a very good description of the video broken up into time stamped segments, so the requests must not be going through the models directly which seems to make the App unreliable at the very least. I've read through all the comments here and haven't seen much about this, so either I have the misfortune of trying at bad times or it just isn't a widespread issue.
Can you summarize the characteristics of various large models?
Hi Martijn,
Thank you for continuously adding new features to the app. As you mentioned in the post above, the app now has many features, especially providing many models for users to choose from. Could you please explain the advantages and disadvantages of each large model when describing pictures and videos? Of course, I know this question sounds a bit subjective, and perhaps everyone's opinion will be different, but I would like to hear your opinion, and I think there will be other friends who, like me, would like to have a reference answer.
Taking pictures
Can you make the app guide the user so that they can take pictures with it?
Re: Taking pictures
Do you mean like Google Pixel's "Guided Frame"? That, would be sweet! ππ
exactly!
I've been wanting something like that for ever now. But I guess you'd need live AI, I mean truly live AI for that?