Hi all!
One day I realized that getting screenshots descriptions required too many steps: take a screenshot, have time to click the "Share" button, select ChatGPT, write a prompt, wait for a response - it seems that there are too many steps to just look at a meme received from a friend in the messenger :) There have long been addons for Windows and NVDA that allow you to perform the same task with the click of a button. If I understand correctly, the problem with iOS is that a third-party application simply does not have the ability to receive screen content, show windows on top of the current application, and the like, which is why all similar solutions involve sending the original image through the "Share" dialog. Therefore, we are limited to the capabilities provided within iOS Shortcuts app.
Acknowledgements:
@aaron ramirez for the excellent Shortcut, familiarization with which allowed me to understand that the capabilities of iOS Shortcuts are generally sufficient to solve such a problem;
Carter Temm for the greate NVDA addon, which allowed me to learn about the most popular models used for generating image descriptions.
Quick Start
For those who are not interested in reading long manuals, installation instructions and feature descriptions, here is a link to the Shortcut itself, the initial setup of which should be intuitive. Simply assign a VoiceOver command for executing this shortcut, after which the Shortcut will take a screenshot and offer a menu with a choice of available options.
Current functionality
Shortcut currently supports generating image descriptions using several popular models. However, adding a new model whose API is compatible with OpenAI does not create any difficulties. Shortcut's current functionality includes:
Getting image descriptions using a given model;
Optical text recognition from images using the OCR-Engine built into iOS;
Additional questions about the image whose description the model generates;
Using a screenshot as an input image or the ability to send your own image through the "Share" dialog;
Displaying lists, tables and other formatting elements in the generated image description;
Copying the last answer to the clipboard;
Optional sound notification upon completion of description generation;
The ability to get answers in any language supported by the model (due to technical limitations of Shortcuts, the language itself must be specified manually).
Setting things up
Create (or use an existing) API key on one of the following platforms:
OpenAI Platform -- paid, supported models are 'GPT-5.2 Pro', 'GPT-5.2', 'GPT-5 Mini' and 'GPT-5 Nano';
Google AI Studio -- provides free tear, supported models are 'Gemini 3 Pro', 'Gemini 3 Flash' and 'Gemini 2.5 Flash-Lite';
Anthropic Developer Platform -- paid, supported models are 'Claude Opus 4.6', 'Claude Sonnet 4.5' and 'Claude Haiku 4.5';
xAI Developer Console -- paid, supported models are 'Grok 4' and 'Grok 4 Fast';
Mistral AI Console -- provides free tear, supported models are 'Mistral Large 3', 'Mistral Medium 3.1' and 'Mistral Small 3.2';
Pollinations AI -- provides some tokens available for free, supported model is 'Pollinations';
Groq Cloud Console -- provides free tear for personal usage, supported model is 'Llama 4'.
Install Shortcut by following the link.
A dialog will appear asking you to set some settings:
Play sound: Determines whether to play a sound after description generation is complete, enter 'y' or 'n';
Description model: Model that will be used for generating image descriptions, choose one for which you have an API key;
Model API key: API key for model chosen at the previous step;
Description prompt: prompt that will be sent to the model to get an image description , enter '/default' to use a preset prompt or your own prompt;
Max tokens: the maximum number of tokens that will be used during the request execution;
Language: the language in which the model will generate responses, enter the full name of the language, for example, 'English'.
Assign VoiceOver command for executing this Shortcut:
Go to the Settings --> Accessibility --> VoiceOver --> Commands --> All Commands --> Shortcuts --> CloudVision;
Assign the gesture or the keyboard command for this Shortcut.
Usage instructions
Perform a VoiceOver Command or select CloudVision in the "Share" dialog to get an image description.
A menu will appear with the following options:
Describe image: get an image description using the model selected in the Shortcut settings;
Recognize text: Recognize text from an image using the OCR-Engine built into the system;
Cancel: quit.
After selecting one of the options, it will take some time to receive a response (in the case of description generation, this can be about ten seconds, while text recognition occurs almost instantly).
The results of the image analysis will appear in a separate window. After familiarizing yourself with which, you can close the window using the button located in the upper right corner.
After viewing the generated description, a menu will appear with the following options:
Chat with model: ask an additional question about the image being analyzed;
Copy last answer: Copy the model's last answer to the clipboard;
Cancel: quit.
After selecting the "Chat with model" option, a dialog will appear asking you to enter your question.
Similar to the original description, the generated response will appear in a separate window after a while.
After viewing the received response, you can continue asking follow-up questions or end your interaction with Shortcut.
Adding your own models
The following instructions involve extensive interaction with the Shortcut implementation. If necessary, detailed instructions for creating Shortcuts can be found here.
Open the Shortcuts app, find CloudVision Shortcut and, using the VoiceOver rotor, select an "Edit" action.
Find the
description_modelvariable, in the text field located right before it, enter the human-readable name of the model you are adding.Find the
description_model_api_keyvariable, in the text field located right before it, enter your API key (if necessary).Find the
description_modelsvariable, using the "Add New Item" button located right before it, create an entry for your model, selectingDictionaryas the value type.For the
key, enter the model name specified in step 2.Click on a value to go to the
dictionaryediting screen.Create the following entries with parameters for your model:
Required, type:
text,Key: 'url',value: URL to which requests will be executed, for example 'https://api.openai.com/v1/chat/completions';Optional, type:
text,key: 'user_agent',value:User-Agent, with which requests to the model will be sent, if omitted, the default value is 'curl/8.4.0';Required, type:
text,key: 'model_name',value: The value of themodelfield in the request;Optional, type:
text,key: 'request_messages_key',value: The key by which the request contains an array of the messages, if omitted, the default value is 'messages';Optional, type:
text,key: 'request_tokens_key',value: The key by which the request contains an integer for max tokens number, if omitted, the default value is 'max_tokens';Optional, type:
dictionary,key: 'additional_parameters',value:dictionarywhose elements will be directly added to the request, can be used to specify parameters such as 'max_tokens' or 'temperature', if omitted, the default value is emptydictionary, i.e. no additional parameters will be added to the request;Optional, type:
text,key: 'response_messages_key',value: The key (or path) where the response contains the text of the received answer, if omitted, the default value is 'choices.1.message.content'.Optional, type:
text,key: 'response_error_key',value: The key where the response contains the possible error message, if omitted, the default value is 'error'.
Note: If you want to omit any of the fields marked as optional, do not create an entry in the
dictionarycorresponding to that field.After filling in all the specified fields, you can complete editing the Shortcut.
To switch between already added models, simply assign the
description_modelvariable a value that matches the correspondingkeyin thedescription_modelsdictionary.
After all
I hope you find this Shortcut useful. You can leave any questions, bugs and suggestions in the comments below.
Comments
Re: Error when running Shortcut
If you have not updated to the version dated January 28, try doing so, the link is in the original post. The API that Shortcut uses has changed and may not work. If you continue to receive a similar error, the only way out is to use paid models with your own API Key.
Update
Shortcut has been updated, the new version can be installed from this link or from the original post.
The API I used previously to get image descriptions is no longer available. Use your own API key with one of the paid models. Llama 3.2 has also been added, this model is free, but its use also requires obtaining an API key from the Groq Cloud Console.
Gemini?
Would you consider adding Gemini? Asking mainly because I already have a Gemini API key, and well, I actually like Gemini. π
Update
Shortcut has been updated. Now all models require an API key, however, there are also free options, to use which you just need to create an appropriate account. The following models are currently supported: 'GPT 5.2' (paid), 'Gemini 3 Pro' (paid), 'Gemini 3 Flash' (free), 'Grok 4' (paid), 'Pollinations' (free) and 'Llama 4' (free, requires Groq cloud console API key).
Shortcut can be installed from this link or from the original post.
Thanks, works great!
Using Gemini with my API key. Thank you for including that model. Appreciate it. π
Using the shortcut on my home screen
The wallpaper on this iPhone is a captivating and scientifically-inspired depiction of our solar system, set against the vast, velvety blackness of deep space. It serves as a celestial backdrop that gives the organized grid of apps a sense of floating in a grand, cosmic void.
At the very heart of the image, positioned almost perfectly behind the "Games" and "Travel" folders in the bottom center, is a brilliant, glowing white light representing the Sun. It doesn't have a hard edge but instead radiates a soft, golden-white halo that subtly illuminates the space immediately around it.
Emanating from this central light are several thin, delicate, and perfectly concentric circles. These represent the orbital paths of the planets. They are rendered in a faint, translucent white, appearing almost like gossamer threads traced onto the darkness. Because of the perspective, these circles appear as elongated ellipses, stretching wide across the screen and disappearing behind the edges of the app icons.
Various planets are meticulously placed along these orbital lines, each appearing as a tiny, detailed sphere.
β’ Closest to the sun, you can see small, dark specks representing the inner planets like Mercury and Venus.
β’ Moving outward, Earth is visible as a tiny, vibrant blue and white marble, positioned just below the "App Store" icon.
β’ Mars appears as a small, distinct reddish-orange dot nearby.
β’ Further out, the gas giants become more prominent. Jupiter is visible as a larger, tan-colored sphere with faint horizontal bands.
β’ The most recognizable is Saturn, which is beautifully detailed with its iconic rings tilted at an angle. It sits elegantly between the "Settings" icon and the "Travel" folder.
β’ In the furthest reaches, near the top and bottom corners, you can spot the pale blue and teal hues of Uranus and Neptune.
The background itself isn't just a flat black; itβs a deep "starfield." Scattered across the entire screen are hundreds of tiny, shimmering pinpricks of light. Most are brilliant white, but if you look closely, some have a faint blue or reddish tint, mimicking the distant stars and galaxies of the night sky.
The overall effect of the wallpaper is one of serene order and immense scale. The thin lines of the orbits provide a geometric structure that complements the square grid of the apps, making the phone's interface feel like a window looking out into the organized beauty of the universe.
ο»Ώ
Disclaimer, this is the solar system wallpaper that comes pre-installed on iPhones. Currently, this is what I have active on my iPhone SE 2022, running iOS 18.7.2. This description was provided by Gemini, via the shortcut created by the OP of this thread. π
Update
Shortcut has been updated, the current version can be installed from this link or from the original post.
Several new models have been added, as well as alternative models from existing providers. Below is a list of vendors, as well as models that can be used with their API keys:
OpenAI Platform -- paid, supported models are 'GPT-5.2 Pro', 'GPT-5.2', 'GPT-5 Mini' and 'GPT-5 Nano';
Google AI Studio -- provides free tear, supported models are 'Gemini 3 Pro', 'Gemini 3 Flash' and 'Gemini 2.5 Flash-Lite';
Anthropic Developer Platform -- paid, supported models are 'Claude Opus 4.6', 'Claude Sonnet 4.5' and 'Claude Haiku 4.5';
xAI Developer Console -- paid, supported models are 'Grok 4' and 'Grok 4 Fast';
Mistral AI Console -- provides free tear, supported models are 'Mistral Large 3', 'Mistral Medium 3.1' and 'Mistral Small 3.2';
Pollinations AI -- provides some tokens available for free, supported model is 'Pollinations';
Groq Cloud Console -- provides free tear for personal usage, supported model is 'Llama 4'.
The option to specify the maximum number of tokens during initial setup has been added. The more tokens are used, the longer and more detailed the answer you get, but the faster your model limits are exhausted.