I don't know where to put this so I hope this is the right place.
I just typed in audio description into youtube and came across this.
I honestly don't like it, I feel like AI isn't there yet when it comes to describing things, also you can hear that the voice is AI if you listen carefully.
It's ok I guess for describing a nature doc or something but I feel it doesn't have that human touch so wouldn't work for a horror movie or comedy for example.
The idea is interesting, but I hope that people like game studios and so on don't pick this up because it's easy.
Anyway; here's there youtube link: https://www.youtube.com/@Visonic-AI-m2c and here's their website: https://visonicai.com/
What do you guys think?
Comments
They should use the Microsoft Neural voices
Those Microsoft voices are so good. But again, someone needs to supervise how the TTS pronounces names etc. It shouldn't sound weird.
Also, I doubt if this can be utilized for actual movies and shows. Adding AD in such content requires carefull attention to place AD exactly at the right place, in between dialogs. An AI can certainly analyse the dialogs and the gaps between dialogs, but doubt if this one is doing it yet.
The page says it can annalise the movie for silences.
I like AI for what it can do for us but then things like this come along.
I wonder how much practise they have at actually audio describing things? Do they have actual audio describers backing this product? I doubt it.
Their two demos show the exact same movie clip and one for the slightly longer one and i'm just not impressed at all.
Maybe i'm just old but I just can't see a world where something like this would be prefered over a human narrator.
Games
Honestly if it's a way to get more accessibility in games I'd be for it, even if we have to wean them off it later. A fair number of games are adding menu narration with no other accessibility features for us and if this gives them another simple drop in tool at least that's more than we would get otherwise.
re: games.
the issue there is that the persons voice can help connect the story together, for example in a fast paiced scene, the narrator might speed up slightly or sound a bit more excited or whatever you need for that scene, this just sounds flat.
Also, giving devs tools like this will just make them lazy.
Re:Games
Long term I agree but menu narration has already made them lazy, they add it in and think that's a box checked off the list. It's the people who manage the money we need to convince as much as anyone though, they look at cost benefit so if we can reduce the cost it'll be easier to make it start being a thing. The more games that add it the more likely other devs are to take it seriously and the more chance we have to prove that it's worth the investment.
Ultimately I see the quality of game audio description as secondary to making it exist at all, sure we're making a battle for ourselves further down the road but it's still probably an easier process than having one holy hell of a battle to get human audio description as a widespread feature without an intermediate step.
I think it is awesome!
If audio describing can be made this easy and cost effective to implement, we could go back and have the olden golden movies audio described. It surely will take away Voice Over jobs; it is going to hit hard on those artists. Honestly, did not mind the vocals/ voice of the audio description did not feel like it was done by AI but I ain't a trained listener.
Potential
So it's interesting as a long term step. However the demos are short and not comprehensive, they put together description sentences from some unknown scenes and there are no dialog, no action. But I would like to no if this aproach can be adapted for other description apps.. Seeing AI has already a video description function but as I know it pauses the video then describes it. I wonder if they can apply something like this to the function.
Interesting
It feels to me like AD should be something that AI could do really well. Let's face it, most AD actors aren't brimming with personality - that's not really their job - they are supposed to be quite neutral so as not to intrude - so it feels to me like AI could potentially work well.
If I was able to get AD on anything I cared to watch and didn't have to rely on someone else to provide it then that would be fantastic.
I think it also has the scope for giving us a much more customised experience.
For example, maybe we can select the voice we want and increase the speech rate. Maybe we could tell it what things we are specifically interested in knowing. Maybe we could tweak the personality to make it more or less engaging.
Maybe I could tweak the verbosity so I could get the details I find most important.
One thing that always bothers me about AD is that it takes away the enjoyment of music in tv or film because it's always ducked with a voice over the top. I could tell the AI not to read out the credits. Or conversely always tell me when the credits appear. Sometimes I'd like to be able to just tell it to shush so I can enjoy the music. Maybe I could pause the action and ask it to recap or give me some more details about what is on-screen.
Maybe I could get the AD playing out of my air pods whilst the tv plays the normal sound without ducking. This would also be good for sighted people watching with me.
I know there is a big difference between a pre-recorded AI track and something done in real time. And the problem with AD is that it has to fit in tiny gaps in dialogue.
I think if we decoupled the AD from the program then it has the potential to allow its technology to progress, and possibly allow different things to be tried out. Maybe sometimes I would accept it if it paused the program to tell me some more details before carrying on if the gap isn't big enough.
Away from AD, I know AI is being used to try to supplement what's going on, so telling you which actors are on screen or whatever. Sometimes I would love to be able to just say "wait, who is that again?" or "maybe I've forgotten the context behind a scene and would like a little reminder. On some ways it could all be tied together so instead of AD being one-way maybe it becomes 2-way.
@mr grieves and this is why i love applevis.
I come in all, this is bad, we shouldn't use this, blah blah blah, and you've now let me see this in a different light.
Those are very interesting ideas.
Now you mention it, a narrator with a personality would be nice, you're right that they're lacking in that department.
Alright mr greaves, you've convinced me not to be so negative, let's see where this goes.
re:games
Alright, I'll be posative, it could work out nicely.
The only issue is that it's all well and good adding AD but what about navigation assistance, what if that AD just becomes another box that's ticked.
I'm trying to be posative, trust me, I love games, but I can see this going wrong so fast.
Anyway, let's see where it goes.