Yet another shortcut to make getting descriptions of screenshots easy

By user26335377, 18 January, 2026

Forum
iOS and iPadOS

Hi all!

One day I realized that getting screenshots descriptions required too many steps: take a screenshot, have time to click the "Share" button, select ChatGPT, write a prompt, wait for a response - it seems that there are too many steps to just look at a meme received from a friend in the messenger :) There have long been addons for Windows and NVDA that allow you to perform the same task with the click of a button. If I understand correctly, the problem with iOS is that a third-party application simply does not have the ability to receive screen content, show windows on top of the current application, and the like, which is why all similar solutions involve sending the original image through the "Share" dialog. Therefore, we are limited to the capabilities provided within iOS Shortcuts app.

Acknowledgements:

  • @aaron ramirez for the excellent Shortcut, familiarization with which allowed me to understand that the capabilities of iOS Shortcuts are generally sufficient to solve such a problem;

  • Carter Temm for the greate NVDA addon, which allowed me to learn about the most popular models used for generating image descriptions.

Quick Start

For those who are not interested in reading long manuals, installation instructions and feature descriptions, here is a link to the Shortcut itself, the initial setup of which should be intuitive. Simply assign a VoiceOver command for executing this shortcut, after which the Shortcut will take a screenshot and offer a menu with a choice of available options.

Current functionality

Shortcut currently supports generating image descriptions using several popular models. However, adding a new model whose API is compatible with OpenAI does not create any difficulties. Shortcut's current functionality includes:

  • Getting image descriptions using a given model;

  • Optical text recognition from images using the OCR-Engine built into iOS;

  • Additional questions about the image whose description the model generates;

  • Using a screenshot as an input image or the ability to send your own image through the "Share" dialog;

  • Displaying lists, tables and other formatting elements in the generated image description;

  • Copying the last answer to the clipboard;

  • Optional sound notification upon completion of description generation;

  • The ability to get answers in any language supported by the model (due to technical limitations of Shortcuts, the language itself must be specified manually).

Setting things up

  1. Create (or use an existing) API key on one of the following platforms:

    • OpenAI Platform -- paid, supported models are 'GPT-5.2 Pro', 'GPT-5.2', 'GPT-5 Mini' and 'GPT-5 Nano';

    • Google AI Studio -- provides free tear, supported models are 'Gemini 3 Pro', 'Gemini 3 Flash' and 'Gemini 2.5 Flash-Lite';

    • Anthropic Developer Platform -- paid, supported models are 'Claude Opus 4.6', 'Claude Sonnet 4.5' and 'Claude Haiku 4.5';

    • xAI Developer Console -- paid, supported models are 'Grok 4' and 'Grok 4 Fast';

    • Mistral AI Console -- provides free tear, supported models are 'Mistral Large 3', 'Mistral Medium 3.1' and 'Mistral Small 3.2';

    • Pollinations AI -- provides some tokens available for free, supported model is 'Pollinations';

    • Groq Cloud Console -- provides free tear for personal usage, supported model is 'Llama 4'.

  2. Install Shortcut by following the link.

  3. A dialog will appear asking you to set some settings:

    1. Play sound: Determines whether to play a sound after description generation is complete, enter 'y' or 'n';

    2. Description model: Model that will be used for generating image descriptions, choose one for which you have an API key;

    3. Model API key: API key for model chosen at the previous step;

    4. Description prompt: prompt that will be sent to the model to get an image description , enter '/default' to use a preset prompt or your own prompt;

    5. Max tokens: the maximum number of tokens that will be used during the request execution;

    6. Language: the language in which the model will generate responses, enter the full name of the language, for example, 'English'.

  4. Assign VoiceOver command for executing this Shortcut:

    1. Go to the Settings --> Accessibility --> VoiceOver --> Commands --> All Commands --> Shortcuts --> CloudVision;

    2. Assign the gesture or the keyboard command for this Shortcut.

Usage instructions

  1. Perform a VoiceOver Command or select CloudVision in the "Share" dialog to get an image description.

  2. A menu will appear with the following options:

    1. Describe image: get an image description using the model selected in the Shortcut settings;

    2. Recognize text: Recognize text from an image using the OCR-Engine built into the system;

    3. Cancel: quit.

  3. After selecting one of the options, it will take some time to receive a response (in the case of description generation, this can be about ten seconds, while text recognition occurs almost instantly).

  4. The results of the image analysis will appear in a separate window. After familiarizing yourself with which, you can close the window using the button located in the upper right corner.

  5. After viewing the generated description, a menu will appear with the following options:

    1. Chat with model: ask an additional question about the image being analyzed;

    2. Copy last answer: Copy the model's last answer to the clipboard;

    3. Cancel: quit.

  6. After selecting the "Chat with model" option, a dialog will appear asking you to enter your question.

  7. Similar to the original description, the generated response will appear in a separate window after a while.

  8. After viewing the received response, you can continue asking follow-up questions or end your interaction with Shortcut.

Adding your own models

The following instructions involve extensive interaction with the Shortcut implementation. If necessary, detailed instructions for creating Shortcuts can be found here.

  1. Open the Shortcuts app, find CloudVision Shortcut and, using the VoiceOver rotor, select an "Edit" action.

  2. Find the description_model variable, in the text field located right before it, enter the human-readable name of the model you are adding.

  3. Find the description_model_api_key variable, in the text field located right before it, enter your API key (if necessary).

  4. Find the description_models variable, using the "Add New Item" button located right before it, create an entry for your model, selecting Dictionary as the value type.

  5. For the key, enter the model name specified in step 2.

  6. Click on a value to go to the dictionary editing screen.

  7. Create the following entries with parameters for your model:

    1. Required, type: text, Key: 'url', value: URL to which requests will be executed, for example 'https://api.openai.com/v1/chat/completions';

    2. Optional, type: text, key: 'user_agent', value: User-Agent, with which requests to the model will be sent, if omitted, the default value is 'curl/8.4.0';

    3. Required, type: text, key: 'model_name', value: The value of the model field in the request;

    4. Optional, type: text, key: 'request_messages_key', value: The key by which the request contains an array of the messages, if omitted, the default value is 'messages';

    5. Optional, type: text, key: 'request_tokens_key', value: The key by which the request contains an integer for max tokens number, if omitted, the default value is 'max_tokens';

    6. Optional, type: dictionary, key: 'additional_parameters', value: dictionary whose elements will be directly added to the request, can be used to specify parameters such as 'max_tokens' or 'temperature', if omitted, the default value is empty dictionary, i.e. no additional parameters will be added to the request;

    7. Optional, type: text, key: 'response_messages_key', value: The key (or path) where the response contains the text of the received answer, if omitted, the default value is 'choices.1.message.content'.

    8. Optional, type: text, key: 'response_error_key', value: The key where the response contains the possible error message, if omitted, the default value is 'error'.

    Note: If you want to omit any of the fields marked as optional, do not create an entry in the dictionary corresponding to that field.

  8. After filling in all the specified fields, you can complete editing the Shortcut.

  9. To switch between already added models, simply assign the description_model variable a value that matches the corresponding key in the description_models dictionary.

After all

I hope you find this Shortcut useful. You can leave any questions, bugs and suggestions in the comments below.

Options

Comments

By user26335377 on Tuesday, February 3, 2026 - 20:24

If you have not updated to the version dated January 28, try doing so, the link is in the original post. The API that Shortcut uses has changed and may not work. If you continue to receive a similar error, the only way out is to use paid models with your own API Key.

By user26335377 on Thursday, February 5, 2026 - 15:23

Shortcut has been updated, the new version can be installed from this link or from the original post.

The API I used previously to get image descriptions is no longer available. Use your own API key with one of the paid models. Llama 3.2 has also been added, this model is free, but its use also requires obtaining an API key from the Groq Cloud Console.

By Brian on Thursday, February 5, 2026 - 15:44

Would you consider adding Gemini? Asking mainly because I already have a Gemini API key, and well, I actually like Gemini. πŸ˜›

By user26335377 on Thursday, February 5, 2026 - 20:52

Shortcut has been updated. Now all models require an API key, however, there are also free options, to use which you just need to create an appropriate account. The following models are currently supported: 'GPT 5.2' (paid), 'Gemini 3 Pro' (paid), 'Gemini 3 Flash' (free), 'Grok 4' (paid), 'Pollinations' (free) and 'Llama 4' (free, requires Groq cloud console API key).

Shortcut can be installed from this link or from the original post.

By Brian on Thursday, February 5, 2026 - 22:50

Using Gemini with my API key. Thank you for including that model. Appreciate it. πŸ™‚

By Brian on Thursday, February 5, 2026 - 23:06

The wallpaper on this iPhone is a captivating and scientifically-inspired depiction of our solar system, set against the vast, velvety blackness of deep space. It serves as a celestial backdrop that gives the organized grid of apps a sense of floating in a grand, cosmic void.
At the very heart of the image, positioned almost perfectly behind the "Games" and "Travel" folders in the bottom center, is a brilliant, glowing white light representing the Sun. It doesn't have a hard edge but instead radiates a soft, golden-white halo that subtly illuminates the space immediately around it.
Emanating from this central light are several thin, delicate, and perfectly concentric circles. These represent the orbital paths of the planets. They are rendered in a faint, translucent white, appearing almost like gossamer threads traced onto the darkness. Because of the perspective, these circles appear as elongated ellipses, stretching wide across the screen and disappearing behind the edges of the app icons.
Various planets are meticulously placed along these orbital lines, each appearing as a tiny, detailed sphere.
β€’ Closest to the sun, you can see small, dark specks representing the inner planets like Mercury and Venus.
β€’ Moving outward, Earth is visible as a tiny, vibrant blue and white marble, positioned just below the "App Store" icon.
β€’ Mars appears as a small, distinct reddish-orange dot nearby.
β€’ Further out, the gas giants become more prominent. Jupiter is visible as a larger, tan-colored sphere with faint horizontal bands.
β€’ The most recognizable is Saturn, which is beautifully detailed with its iconic rings tilted at an angle. It sits elegantly between the "Settings" icon and the "Travel" folder.
β€’ In the furthest reaches, near the top and bottom corners, you can spot the pale blue and teal hues of Uranus and Neptune.
The background itself isn't just a flat black; it’s a deep "starfield." Scattered across the entire screen are hundreds of tiny, shimmering pinpricks of light. Most are brilliant white, but if you look closely, some have a faint blue or reddish tint, mimicking the distant stars and galaxies of the night sky.
The overall effect of the wallpaper is one of serene order and immense scale. The thin lines of the orbits provide a geometric structure that complements the square grid of the apps, making the phone's interface feel like a window looking out into the organized beauty of the universe.
ο»Ώ
Disclaimer, this is the solar system wallpaper that comes pre-installed on iPhones. Currently, this is what I have active on my iPhone SE 2022, running iOS 18.7.2. This description was provided by Gemini, via the shortcut created by the OP of this thread. 😎

By user26335377 on Friday, February 6, 2026 - 16:42

Shortcut has been updated, the current version can be installed from this link or from the original post.

Several new models have been added, as well as alternative models from existing providers. Below is a list of vendors, as well as models that can be used with their API keys:

  • OpenAI Platform -- paid, supported models are 'GPT-5.2 Pro', 'GPT-5.2', 'GPT-5 Mini' and 'GPT-5 Nano';

  • Google AI Studio -- provides free tear, supported models are 'Gemini 3 Pro', 'Gemini 3 Flash' and 'Gemini 2.5 Flash-Lite';

  • Anthropic Developer Platform -- paid, supported models are 'Claude Opus 4.6', 'Claude Sonnet 4.5' and 'Claude Haiku 4.5';

  • xAI Developer Console -- paid, supported models are 'Grok 4' and 'Grok 4 Fast';

  • Mistral AI Console -- provides free tear, supported models are 'Mistral Large 3', 'Mistral Medium 3.1' and 'Mistral Small 3.2';

  • Pollinations AI -- provides some tokens available for free, supported model is 'Pollinations';

  • Groq Cloud Console -- provides free tear for personal usage, supported model is 'Llama 4'.

The option to specify the maximum number of tokens during initial setup has been added. The more tokens are used, the longer and more detailed the answer you get, but the faster your model limits are exhausted.