New AI app for describing images and video: PiccyBot

How to describe a YouTube video?

Hello Martijn!
I noticed that the 2.14 version updated today mentions the ability to describe YouTube videos. This is truly a great upgrade. However, I haven't found a way to use this feature. I can't find PiccyBot in the sharing menu of YouTube videos, and copying and pasting the YouTube video link into the APP doesn't work either. Can you explain how to use this feature?
Many thanks

2.14: what are the available shortcuts?

Hello Martin,
I discovered in the 2.14 what's new that some shortcuts were also introduced. This is a great joy. But please give us some details, what phrases they are exactly.

Sharing YouTube and Shortcuts

Carter, PiccyBot will describe videos that are shared to it. In the case of YouTube, go to share, then select PiccyBot. The first time it will be hidden under 'more..' and then again 'more..' to find it. After that it will show up earlier.
However, there appears to be a glitch in the YouTube app at the moment, likely related to the iOS update, that somehow causes the share function not to work. For some people it has started to work again. So be aware of that please. Using YouTube in Safari works fine.

Laszio, PiccyBot now has shortcuts, but it is an initial release. Please check it out and let me know what to improve or add?

Right now, with the 2.14 update, Siri will recognize these phrases to trigger the camera shortcut:

"Siri, Open PiccyBot camera",
"Siri, Launch PiccyBot camera",
"Siri, Start PiccyBot camera"

Siri will recognize these phrases to trigger the video recorder shortcut:

"Siri, Open PiccyBot video recorder",
"Siri, Launch PiccyBot video recorder",
"Siri, Start PiccyBot video recorder"

However, again there seems to be a glitch, it doesn't work for everyone, likely related to the iOS 18 updates to enhance SIri. So I didn't announce this functionality yet, let's test it a bit further first.

Shortcuts + uncensored model proposition

Hello Martin,
Thanks much for the info. I tried out all the shortcuts you had mentioned. Siri didn't reject any of them, but opened the main interface of Piccybot in reply to all of them instead of the camera or video recorder part. I note that I run an iOS older than 18 on purpose.
Thanks much for the language selection fix in 2.14. Now my native tongue, Hungarian can be found and selected in the language list, and not just by setting language to system language.
While browsing on Huggingface I found the following very promising uncensored model:
https://huggingface.co/huihui-ai/Qwen2.5-VL-7B-Instruct-abliterated
This is the uncensored version of a very recent Qwen 2.5 vision model (the base model is developed by AliBaba group). This is the 7-billion-parameter variant, so computationally it is on par with the Janus Pro model that you currently run on your server. According to descriptions this is a quite versatile and strong model even in this size and multilingual too. It can process images in their native resolution and can be finetuned if needed. It can even process videos (even long ones) if the inference is done accordingly.
Personally I found nothing spectacular or special about the Janus Pro model, and this one is uncensored, so if feasible I propose to run this one besides, or even instead of Janus Pro on your server. Thanks in advance for considering this.

My YouTube description was successful

Hello Martin,
Thank you very much for your answer. I tried it just now and successfully selected PiccyBot from the share menu on YouTube. However, it's strange that when I posted a question to you a couple of days ago, I used the same method but it didn't work. Anyway, it's working now and it's functioning well.

significant improvement on deepseek ai janus

deepseek r1 is wonderful, but my first impression of janus a while ago was oh well. it seems to have improved very significantly, now it gives a comprehensive description of the photo. Martijn must has done some good work about it.

Summary of features

Since over the past year a lot of features were gradually added to PiccyBot, I thought it would be helpful to give a summary of the current features of the app, for both the free and subscribed version:

For all users (Free & Subscribed)

- Convert Photos and Videos to Descriptions — Upload media, and PiccyBot will generate detailed audio descriptions.

- Ask Follow-up Questions — Engage in a conversation with PiccyBot for specific details about the selected media.

- Background Processing with Notifications — Continue using other apps while PiccyBot processes results in the background.

- Language Selection — PiccyBot uses your phone’s system language for descriptions and instructions.

- Full Localization & VoiceOver Support — Assistive navigation for visually impaired users.

- Social Media Sharing — Share photos and videos directly from apps like Instagram, Messenger,Facebook, Reddit, YouTube, TikTok and X (non-private accounts).

- Dedicated Chat Screen — Chat with PiccyBot for detailed insights about your image or video.

- Siri & Shortcuts Support — Instantly launch the camera or video recorder via Siri commands or a dedicated shortcuts button. 'Siri, open PiccyBot camera' and 'Siri, open PiccyBot video recorder'

- Quick Camera Access with Volume Button — Press the volume button to capture photos directly within the camera.

- Separate Buttons for Media Access — Access the camera, photos, video recorder, or video library directly with dedicated separate buttons.

- Save Descriptions as Metadata — Embed generated descriptions directly into the media file's metadata in the Photos app.

- Video Limits — Free users can process up to 1 minute of video content.

For Subscribed users:

- No ads

- Video Limits for Pro Users — Process videos up to 10 minutes for downloaded, uploaded, or in-app recorded videos.

- YouTube Support:
Videos shorter than 10 minutes are downloaded to your phone and then described.
Videos longer than 10 minutes are described directly without downloading. Fast, but you can't mix the audio afterwards.

- Advanced Settings to customise PiccyBot's output:

Voice Selection — Choose from multiple voices for descriptions.
Personality Mode Switching — Customize the narration style.
Talkback Speed Control — Adjust the pace of audio descriptions.
Model Selection — Select which AI model to use. Each model has its own unique strengths and weaknesses.
Description Length Control — Decide how detailed or brief the descriptions should be.
Video Upload Quality Control — Manage upload resolution for better quality or faster processing.
Process Feedback Sound (On/Off) — Enable or disable sound notifications for processing completion.
Audio-Video Mixing Controls — Adjust video and generated audio volumes independently (e.g., 30% video volume and 90% audio description volume).

- Multiple Sharing Options:
Audio only
Video with optional Audio Mix
Description Only

- Audio-Video Mixing — Combine the original video audio with generated audio descriptions.

- Language Selection — Choose from 55 languages in the PiccyBot settings for descriptions.

In addition to this, there is the PiccyBot Whatsapp service, to which you can send any image, video or website link for an audio description.

Phew, that was it I think! Hope this helps in case you missed or forgot any of the features.

Good luck with PiccyBot, I really appreciate the feedback given in this forum, it has genuinely been a group effort to get to this stage!

Androie bug

Hello guys!
I don't know if I can report bugs in the Android version in this thread. I'm doing this since the developer looks here a lot, so it's easier for us to get support.
If necessary, I'll send it somewhere else.
I'm facing a bug where the description audio isn't being played.

Hi Diego, can you message me…

Hi Diego, can you message me with the details of your device and Android version? All latest PiccyBot features should work on the Android version as well.

Recent Spike in Server Errors

I've been using this App for a couple months now and like it, but have noticed that it very often fails to process Youtube videos, either giving me the generic, "server error," message or just failing to process but then a retry will sometimes work, sometimes not. I do pay for the subscription. I have tried with multiple videos and multiple models. I tried running one of the same links through Gemini Flash to see if it was an issue with the model, but it gave me a very good description of the video broken up into time stamped segments, so the requests must not be going through the models directly which seems to make the App unreliable at the very least. I've read through all the comments here and haven't seen much about this, so either I have the misfortune of trying at bad times or it just isn't a widespread issue.

Can you summarize the characteristics of various large models?

Hi Martijn,
Thank you for continuously adding new features to the app. As you mentioned in the post above, the app now has many features, especially providing many models for users to choose from. Could you please explain the advantages and disadvantages of each large model when describing pictures and videos? Of course, I know this question sounds a bit subjective, and perhaps everyone's opinion will be different, but I would like to hear your opinion, and I think there will be other friends who, like me, would like to have a reference answer.

Taking pictures

Can you make the app guide the user so that they can take pictures with it?

Re: Taking pictures

Do you mean like Google Pixel's "Guided Frame"? That, would be sweet! 😃👍

exactly!

I've been wanting something like that for ever now. But I guess you'd need live AI, I mean truly live AI for that?

I often get errors when using the Chinese voice

Hello Martijn,
I found another bug in an app. I usually use Simplified Chinese, and I noticed that no matter which voice I choose, I often encounter situations where the content of the image description cannot be read out. The specific manifestation is that they will keep saying "Chinese Letter Chinese Letter Chinese Letter" until I have to pause the voice for it to stop. This is really frustrating because it happens frequently, about once for every five images I recognize this bug occurs. Could you find some time to check what's going on with this?

re: I often get errors when using the Chinese voice

I noticed that too using the Chinese languages. Quite often the description gets stuck and starts repeating itself like a broken CD LOL

Gemini Pro 2.5 model added

Hi guys,

I have added the Gemini 2.5 Pro model today. It scores amazing in the benchmarks and my own tests so far have shown it to be really good in video and image descriptions. Check it out!

James, I have not seen any spike in YouTube processing errors, but will monitor it closely, it is one of the popular uses of the app and maybe it affects certain time zones more than others.

Naza, Brian, Gokul, thanks for the suggestion, will see if I can replicate guided frame on iOS.

Carter, Privateai, so far the Chinese voice works for me, but will keep checking. Let me know if it continues to give problems?

Thanks for the feedback as always!

Guided frames

@Martijn if you replicate Guided frames or in other words make an accessible camera app for IOS, you would be doing a path-breaking, pioneering service for IOS users with visual impairment, not to mention facilitating their inclusion in mainstreme in a certain huge way. It need not be a feature within piccibot, rather it can be a stand-alone app, which can help us frame and take decent pictures and save to galary. I would be more than willing to contribute to it in any way I can.

Llama 4 added

Hi guys,

Llama 4 Maverick has been added to PiccyBot, and is now available to subscribed users. It has 17B active parameters, 128 experts. It is one of the fastest models for image descriptions. So far, the descriptions are looking very accurate to me, but let me know what you think?

re:Llama 4 added

I think Llama 4 has the fastest processing speed among these models, but its description effect is not as detailed as Gemini Pro2_5. However, considering both processing speed and description quality, Llama 4 is also a good choice. In my ranking of these models, it's Gemini Pro > Llama 4 > Deepseek.

Mixture of models added

I have added a new model 'PiccyBot Mix' to the available AI models for subscribed users.
This model is a mixture of models. The idea is that the models check each other. Only elements that are described by each model will be included in the description. The aim is to completely remove any hallucinations.
Note that the description of this model will likely be less detailed than individual models, but it should be fully reliable. Also note as of now this only works for image descriptions, not for video descriptions.
Please try it out and judge the accurateness of the description?

Interesting

Interesting addition, thanks. Completely understand why it's image only, video would both be more complicated, require more resources from the AI servers and you need to confirm how well it works before you even consider it.

This is brilliant!

Especially for situations where what you want is an accurate description and where halusinations can be problematic and/or dangerous.

Hallucinations

I think, if you have truly found a way to remove these, you will become everybody’s new best friend.

Awesome!

This is something I've hoped would come along now when we have many good models that can be compared. Seems to work fine, but hard to say if it does not hallucinate of course. A possible further development: having a "fast" and a "thorough/detailed" mix, with modeels tailored to each?
Thank you very much for this!

I don’t think

Hi, I have downloaded the app but after using it for a while, I deleted it from my iPhone. I have Be My Eyes and Seeing AI so I feel like this app isn’t useful

@Andrew

No app will be useful to everyone depending on their situation and needs, that's perfectly valid. Just bear it in mind as a potential tool in the future if you need a short video described, or if you encounter a situation where the AI models in Seeing AI or Be My AI aren't helpful.

Screen sharing on the iPhone

Hi there,

I know this might be a bit of an unusual request, but I wanted to put it out there anyway—because if you don’t ask, you’ll never know what’s possible.

Lately, I’ve been using a few survey websites to earn gift cards like Amazon gift or prepaid Visa cards. These help me make online purchases without needing to use my debit or credit card. One of the main platforms I use is powered by Spectrum Surveys, and while the surveys themselves are accessible most of the time, there’s one issue I keep running into: drag-and-drop tasks.

These questions usually ask me to move items—like dragging an image or word into a specific box or column—but there’s no reliable feedback about what I’m dragging or where I’m dropping it. This makes it nearly impossible to complete certain sections.

Here’s where the idea comes in. I was wondering if it would be possible to create a feature that allows me to share my screen and receive descriptive feedback about what’s happening. For example, if I’m dragging “Apple” into a “Fruit” box, the system could announce something like:
“Dragging Apple. Drop zone: Row 1, Column 1 – Fruit.” And once I release it: “Apple dropped in Fruit.” If I switch to dragging “Blue” into a colors section, it could say: “Dragging Blue. Drop zone: Row 1, Column 2 – Colors.” Then confirm: “Blue dropped in Colors.”

It doesn’t need to control anything for me—I’m not expecting the AI to do the action itself. I’d simply like to hear clear, accurate feedback so I know what I’m doing on screen. This could work in combination with screen recognition when standard reading fails, which is something I already use regularly.

I think having real-time feedback on selections and drop targets would significantly improve my experience and make these survey platforms far more accessible. This kind of assistance wouldn’t fix every challenge, but it would eliminate one of the biggest barriers I run into.

Also, I plan on purchasing the lifetime subscription soon—probably by Friday. I have a $25 prepaid Visa card ready, and I believe the lifetime cost is around $20. Regardless of whether this feature is possible now or in the future, I’m fully committed to supporting the app. I figured I’d share this idea in case it sparks anything or aligns with something already in development.

If this kind of feature is technically possible, it would be a game-changer for me. And if others feel the same way, hopefully they’ll chime in and add their thoughts as well.

Thanks for your time and for building something that’s already made such a difference. I’m looking forward to what comes next.

Tips killing my prompt history!

I like to keep some prompt histories so I don't have to retype them all the time, or go copy them and paste them in from elsewhere. I have noticed the latest version, the APP keeps randomly refreshing as if I closed it and re-opened it, and that keeps erasing my prompt history. Is it those random tips that's doing that?

Recently, this APP has been a bit unstable

I don't know if you've noticed, but the app has been having some issues lately. First, no matter which voice I set for text-to-speech, it won't read aloud anymore. I have to use VoiceOver to browse the results instead of hearing the app automatically read them out as before. Second, the app is increasingly failing to recognize images. In more and more cases, I upload a picture, wait through a long sound effect, and end up with nothing on the screen. It's a bit frustrating. Still, I'm willing to support it until these issues are fixed by the developer.

I'm impressed!

Hi,
I just downloaded this app yesterday, to get extract all text from screenshots of recipes I find on Facebook.
The descriptions are great, even with the free version.
I'll definately be supporting the developer, when I'm able, by purchasing a subscription or maybe even the lifetime option.

Keep up the incredible work! :)

Having problems? I thought it was only me

I’ve actually been talking to the team about this recently. They responded with something like, “Thanks for bringing it to our attention,” and I suggested they really should reach out to others to see if the issue is more widespread. For me personally, it’s not so much an issue with images—it’s more with videos. I usually use it to describe music videos. What happens is, I’ll get one video described just fine, but when I try to run a second one right after, it stops working. It gets stuck on one of those loading screens—I’ll hear the waiting sound, or it’ll say “please wait” or “fetching data,” but then nothing actually happens. It’s pretty frustrating. The only workaround I’ve found so far is to screen record the video and then have that recording described. But obviously, that’s a hassle. So clearly the issue isn’t the video itself, because the screen-recorded version works fine. It might have something to do with how Apple’s files are handled, I don’t know. The team did acknowledge that there’s something going on and said they’re looking into it. At first, I thought it was just me, but apparently not—there does seem to be a real issue here. The pattern’s pretty consistent: one video gets described, then trying another one right after just doesn’t work. I get that this probably takes up a ton of processing power, so maybe there needs to be a buffer or delay, but even when I wait like 10 minutes—or even an hour—nothing happens. And when I say “nothing happens,” I don’t mean it’s sitting there thinking. I mean the video doesn’t even get recognized. If it doesn’t work within the first few minutes, I usually give up because I know it’s not going to load at all. So yeah, that’s definitely something that needs to be addressed. Overall, the app is great and super useful, but this video issue is a real limitation right now.

Issues

Hi Carter Wu, there was an issue with the speech last week while we were trying to upgrade the speech engine. For now, this has been rolled back, it should work as before again. Aiming to add the new voices with an update next week instead, after full testing.

Winter Roses, there has been an issue with PiccyBot getting stuck while trying to download private/copyrighted YouTube videos. It should have given an alert, but it didn't. In the next update (within a week), this will be handled differently. Private videos will be described directly, without downloading them. Note that you won't be able to do any audio mixing on those YouTube videos, but at least you'll get a description of them.

Thanks for the feedback!

Many thoughts

As a voice actor, I hate AI audio description. AI voices in general make me mad because not only do they just not usually sound very good, but they are a crutch that takes away from my vocation in the worst way.

As an audio editor, I utterly loathe how audio ducking is used with audio description. It is almost never ever justified in any way and takes away from music and sound as someone said. I will die on that hill speaking about this practice. Don't do it.

But as an avid gamer and lover of AI , I'm all for this. I'd rather have AI than no audio description. And if AI can describe in real time and be customized, that would be really cool. Right now I really want to have game cutscenes audio described. There still isn't a practical way to do this. I thought Google Gemini was on the verge, but not yet. ANd Pixie Bot is pretty interesting, even if I have to essentially watch the cutscene twice or three times. It too is not all that practical because I ahve to record the video with my phone, or upload it to YouTube first. I love some of the customization suggestions listed above. I Think AI has a lot of potential for audio description. I also love the idea of pausing a scene and getting more in-depth description. So much audio description is surface level at best. It gives an overview, but sometimes a bit more dept would be nice.

Other ideas, maybe?

OK so I have another couple of ideas—I don’t know if anybody else actually wants this personally, but I figured I’d just throw it out there in case it’s something that can be done. And I really hope this doesn’t come across as annoying or anything because I feel like I’m asking for a feature that maybe nobody else would use or even think about. But hey, maybe by putting it out there, something happens, right? So first off—I’m really excited for the new update. I’m currently on the one-month plan and I do have the money for the lifetime subscription, I’m holding off a little bit until some of the current issues get sorted. But yeah, as it stands now, I’m honestly very satisfied with what I’ve got. One thing I saw mentioned was that there are going to be some new voices—which I’m genuinely excited about! So here’s my slightly selfish ask: would it be possible to somehow sign into Amazon Kindle and use one of these voices to read Kindle books directly? I know some text-to-speech apps let you do this, and I’m not sure if it would clash with your current goals for the app or if it’s something that can even be implemented, but I figured it couldn’t hurt to ask.

Basically, I’d love a feature where I can sign into my Kindle account and have one of your voices read the books in the library. The voices are really good—I hope they keep getting better. Honestly, one of the best I’ve heard so far in the TTS space is Speechify. I don’t know where they get their voices, but the quality is top-tier. It's pretty expensive though. Seriously, $140 per year? I don't know if it's because it's promoted by MrBeast, Gwyneth Paltrow, and Snoop Dogg, but, yeah. Must be for the rich. They seem to be promoting as a product for individuals with dyslexia, but again, super expensive. I’d even love the option to create virtual voices. I’ve also seen features in other apps where you can turn text or books into podcasts using virtual voices—kind of like generating a little audio book from your reading material. That would be amazing. Again, not trying to step on any audiobook publishers’ toes here—that’s not my intention. Like, imagine clicking on your Kindle book (once linked), choosing one of the voices, and having it read to you right in the app—no downloads, just direct reading. And maybe with uploaded files or personal documents, there could be more freedom. That’d be next-level accessibility.

Anyway, I know this might be a long shot or not a priority, but thought I’d throw it out there. Thanks again for listening and for continuing to improve the app—it really does make a difference.

Youtube

Could anyone remind me how getting youtube videos described works? I tried sharing it from the youtube app to picciebot and it just says it isn't valid no matter what video I try using.

Latest update

Hi guys,

There is an update in the App Store available that improves YouTube video handling (This should fix your issue Icosa).
A new set of onboarding screens has been added to guide new users using the app.
In addition, the personality voices have received an update. They will be less extreme but still have extra intonation and style. Try them out if you are subscribed to the app.
Also, the latest Gemini Pro 2.5 I/O model has been added to the list. It's good, but slow, so keep that in mind.

Good luck!

Models in the update

Hi Martin,
First of all thank you for the update!
There are more changes in the models list than "just" adding Gemini Pro 2.5 I/O.
A model with an interesting name has also appeared there: "Native Blind Style". What's that exactly?
Furthermore there are two entries named "GPT4O mini" (next to each other in the list). And Deepseek Janus seems to be gone. Is this just a mistake and one of them is still Deepseek Janus actually? If I remember well Deepseek Janus used to be after GPT4O mini in the list, so that gave me the idea to ask this.
Thanks!

Models

Hi Lazlo, the 'Native Blind Style' is a model based upon GPT o4-mini that describes images with the perspective of someone who has been blind their entire life. The focus is on how objects feel, mention of colours is avoided, etc.

And GPT o4-mini is different than GPT 4o-mini. The latter is a small version of GPT4o while the former is a small version of O4. GPT o4-mini is actually the current latest model of OpenAI. Their naming is bizarre, they agree on that themselves. But it is what it is now.

DeepSeek Janus has been running on a local machine that I now use to develop a voice driven assistant agent for the blind. This will be a separate product from PiccyBot. DeepSeek will be back when I find a good place to host it..

Good luck with the update!

Models

Hi Lazlo, the 'Native Blind Style' is a model based upon GPT o4-mini that describes images with the perspective of someone who has been blind their entire life. The focus is on how objects feel, mention of colours is avoided, etc.

And GPT o4-mini is different than GPT 4o-mini. The latter is a small version of GPT4o while the former is a small version of O4. GPT o4-mini is actually the current latest model of OpenAI. Their naming is bizarre, they agree on that themselves. But it is what it is now.

DeepSeek Janus has been running on a local machine that I now use to develop a voice driven assistant agent for the blind. This will be a separate product from PiccyBot. DeepSeek will be back when I find a good place to host it..

Good luck with the update!

Thanks much!

Thank you for the info and the very quick response! Ah yes, the names of the two GPT variants sounded so similar to each other with the Hungarian voice of VoiceOver that I perceived them to be the same. Now I have just had a closer listen (and had them spelled out) and I see the difference.

Thanks

Very much appreciated. Is there any chance you can signpost which models are useable for video or does the app always use a preset model without the ability to change it?

Image and video

Hi Icosa,
Right now, Amazon Nova Lite, Amazon Nova Pro, Gemini Flash 2.5, Gemini Flash 2.0 Lite, Gemini Pro 2.5 I/O and Reka will give video descriptions. For any others not in this list, it will default to Gemini Flash 2.5. So if you set GPT O4-Mini it will use that for image descriptions, and Flash 2.5 for video. If you set Pro 2.5 I/O it will use that for both image and video.

Hope that helps!

Re: Update, personalities won't turn off

First I'll say that the voices do sound better with personalities on, but... Once I switch the personality on, it won't switch off even if the toggle says off- I have to physically restart the app after turning off personality for it go to away.
Also, how come the personalities are more censored than no personality? I had a piece of erotic art described, and when I tried to ask questions with personality on, it show the text on the screen but the voice just kept repeating "Sorry I can't describe that" over and over and over again...like it was stuck.

possible bug with voice personalities?

Hi! First of all, I have to say that I absolutely love that Piccy bot can describe youtube videos now. I know it's kind of old news at this point, but I only gave it a try and got it working yesterday.

Unfortunately, I think I may have stumbled across a bug. Normally, I have personality for voices turned off; at least in the past, the personalities didn't really add anything, at least not in my opinion. However, when I read that the personalities had been revised, I wanted to give them a try, most specifically for describing Youtube videos. Unfortunately, when I request description of a video and personalities are enabled, a text description appears, but nothing is ever spoken, even if I use the play button. However, when I turn personalities back off, youtube videos are being described just fine.

I'm using the Gemini 2.5 model; think a previous commenter may have been on to something when pointing out that models are not labeled accurately at the moment. I think I'm using gemini pro, but it says flash? It works well, so not a huge deal. My primary concern is the failure of descriptions to be spoken when personalities are enabled. If this is an issue that can't be fixed, it's not a deal breaker for me by any means. I enjoy the descriptions I get without personalities, so probably never would have even discovered this particular bug if I hadn't read that personalities had been given a makeover. In closing, I love Piccy bot and really appreciate all the hard work that has gone into it.
Thanks!

Personality problems

The thing is, I thought it was only me until I sat down to read these comments, but yeah. I only tried it once. I turned off the personality feature because it’s not really my thing—it’s kind of funny, I don’t like the overly descriptive tone. So when I heard the personality feature had been revamped, I figured, OK, let me try it again and see if the issue’s been fixed. I love the fact that it’s not overly exaggerated anymore and it works really well.

But here’s the problem—like the other comment mentioned, once you turn on the personality feature, the voice doesn’t work. The first time I tried it with personality on, I actually heard the voice when the result appeared on the screen. But now it doesn’t seem to work anymore. Once I turn off the personality feature, everything goes back to normal and the voice works fine. For some reason, as soon as the personality feature is turned on, you don’t hear anything—the voice stops speaking.

Personally, I like having the voice on, and I also keep the sound on while the video is being prepared for description so I know something is happening on the screen. When I don’t hear the sound anymore, I assume the video’s done and that’s when I go read it. Normally, I’d prefer to read the description with VoiceOver on my iPhone directly, but for something like this, where the description pops up on the screen, I like having the voice so I know when the result is ready. I can always pause the speaking and read the text that way, but it’s good to have that audio feedback.

I don’t know if this would be overkill or not, but for people who choose to have the voice on, I feel like it’s helpful. If someone prefers to turn the voice off and only read the text with VoiceOver, it would be nice to have an audio cue to let them know the video is done processing. I’d love this feature. I’m not planning to turn the voice off because I really like the voices right now, but if I ever did, I’d want some kind of audio cue—like a chime or a little tone to let me know the description’s ready, instead of relying on silence when the preparation sound stops. Maybe it could even be a vibration, or give users the option to choose sound, vibration, or both. I think that would be really useful and a nice addition. Plus, it would be helpful if you decided to turn off the preparation sound, along with the voice. When the video is done processing and the description is on the screen, you wouldn't have to rely on the dead silence to let you know that information

Oh, another thing I forgot to mention—does it work with the Magic Tap, the two-finger double tap to start and stop when it’s speaking? When reading the description. I’d love to be able to pause or stop the voice without having to search around the screen for the play or pause button. It would make it way easier and faster if the Magic Tap could control the speech while the description is being read out loud.

Thanks again

Thanks for the list of models available for video. It would still be nice to have some kind of information on this in the app, whether this is a label on the ones that support video or a second selection for video model. Really do love the video descriptions though, especially for videos of my nieces.

The startup guide is very good

Hello Martijn,
I think as an increasingly mature APP, it is very necessary to have a startup guide. I'm glad you implemented this feature. However, I noticed an issue: your APP supports over fifty languages, but at least on the startup guide page, many texts still only display in English. Taking the Chinese I use as an example, I found that some parts of the startup guide text can already be displayed in Chinese, but there are still some parts showing in English. Could you further localize and translate these texts? If you need help, at least for the Chinese part, I can offer assistance.

retry button

the retry button, i believe, is right below the play button? it is accessible by swipe but not by touch. can we update this so it is accessible by explore/touch? thanks.

No audio for long video descriptions

As Missy and Winter reported, currently the new audio descriptions with personality 'on' won't work for very long descriptions. The TTS model can't cope with these longer descriptions. Looking into a way around this. For now, the only thing you can do is to reduce the length of the description (set length to 40 or so). Then the audio description should work. If you just want to use voiceover instead, set voice to 'none' and length to 100.

Carter, I will look at the translation of the intro pages, thanks for pointing out Chinese is not working properly yet. Appreciate the offer for help, please contact me privately?

LaBoheme, thanks for noting the retry accessibility, should be no issue to improve that, it's on the list now.

Thanks again guys!

New AI app for describing images and video: PiccyBot

Options

Comments