Hi all!
One day I realized that getting screenshots descriptions required too many steps: take a screenshot, have time to click the "Share" button, select ChatGPT, write a prompt, wait for a response - it seems that there are too many steps to just look at a meme received from a friend in the messenger :) There have long been addons for Windows and NVDA that allow you to perform the same task with the click of a button. If I understand correctly, the problem with iOS is that a third-party application simply does not have the ability to receive screen content, show windows on top of the current application, and the like, which is why all similar solutions involve sending the original image through the "Share" dialog. Therefore, we are limited to the capabilities provided within iOS Shortcuts app.
Acknowledgements:
@aaron ramirez for excellent Shortcut, familiarization with which allowed me to understand that the capabilities of iOS Shortcuts are generally sufficient to solve such a problem;
@alekssamos for great NVDA addon, which gave me the idea of ββusing the free api, as well as adding some additional features, like text recognition.
Quick Start
For those who are not interested in reading long manuals, installation instructions and feature descriptions, here is a link to the Shortcut itself, the initial setup of which should be intuitive. Simply assign a VoiceOver command for executing this shortcut, after which the Shortcut will take a screenshot and offer a menu with a choice of available options.
Current functionality
Currently the Shortcut uses Piccy Bot API to get free descriptions, but it is implemented in such a way that adding a new model whose API is ChatGPT compatible is not too difficult. The functionality supported by this Shortcut includes:
Getting image descriptions using a given model;
Optical text recognition from images using the OCR-Engine built into iOS;
Additional questions about the image whose description the model generates;
Using a screenshot as an input image or the ability to send your own image through the "Share" dialog;
Displaying lists, tables and other formatting elements in the generated image description;
Copying the last answer to the clipboard;
Optional sound notification upon completion of description generation;
The feature for automatically turning the VoiceOver off while taking a screenshot, so you don't have to worry about the current Screen Curtain state;
The ability to get answers in any language supported by the model (due to technical limitations of Shortcuts, the language itself must be specified manually).
Setting things up
Install Shortcut by following the link.
A dialog will appear asking you to set some settings:
Play sound: Determines whether to play a sound after description generation is complete, enter 'y' or 'n';
Disable VoiceOver: Experimental feature for disabling the VoiceOver before taking a screenshot and enabling it back after, increases the delay in the appearance of the options menu by about a second, enter 'y' or 'n';
Description prompt: prompt that will be sent to the model to get an image description , enter '/default' to use a preset prompt or your own prompt;
Language: the language in which the model will generate responses, enter the full name of the language, for example, 'English'.
Assign VoiceOver command for executing this Shortcut:
Go to the Settings --> Accessibility --> VoiceOver --> Commands --> All Commands --> Shortcuts --> CloudVision;
Assign the gesture or the keyboard command for this Shortcut.
Usage instructions
Perform a VoiceOver Command or select CloudVision in the "Share" dialog to get an image description.
A menu will appear with the following options:
Describe image: get an image description using the model selected in the Shortcut settings;
Recognize text: Recognize text from an image using the OCR-Engine built into the system;
Cancel: quit.
After selecting one of the options, it will take some time to receive a response (in the case of description generation, this can be about ten seconds, while text recognition occurs almost instantly).
The results of the image analysis will appear in a separate window. After familiarizing yourself with which, you can close the window using the button located in the upper right corner.
After viewing the generated description, a menu will appear with the following options:
Chat with model: ask an additional question about the image being analyzed;
Copy last answer: Copy the model's last answer to the clipboard;
Cancel: quit.
After selecting the "Chat with model" option, a dialog will appear asking you to enter your question.
Similar to the original description, the generated response will appear in a separate window after a while.
After viewing the received response, you can continue asking follow-up questions or end your interaction with Shortcut.
Adding your own models
The following instructions involve extensive interaction with the Shortcut implementation. If necessary, detailed instructions for creating Shortcuts can be found here.
Open the Shortcuts app, find CloudVision Shortcut and, using the VoiceOver rotor, select an "Edit" action.
Find the
description_modelvariable, in the text field located right before it, enter the human-readable name of the model you are adding.Find the
description_model_api_keyvariable, in the text field located right before it, enter your API key (if necessary).Find the
description_modelsdictionary using the "Add New Item" button, create an entry for your model, selectingDictionaryas the value type.For the
key, enter the model name specified in step 2.Click on a value to go to the
dictionaryediting screen.Create the following entries with parameters for your model:
Type:
text,Key: 'url',value: URL to which requests will be executed, for example 'https://api.openai.com/v1/chat/completions';Type:
text,key: 'user_agent',value:User-Agent, with which requests to the model will be sent;Type:
text,key: 'model_name',value: The value of themodelfield in the request;Type:
text,key: 'request_messages_key',value: The key by which the request contains an array of the messages, for example 'messages';Type:
dictionary,key: 'additional_parameters',value:dictionarywhose elements will be directly added to the request, can be used to specify parameters such as 'max_tokens' or 'temperature';Type:
text,key: 'response_messages_key',value: The key (or path) where the response contains the text of the received answer, for example 'choices.1.message.content'.
After filling in all the specified fields, you can complete editing the Shortcut.
To switch between already added models, simply assign the
description_modelvariable a value that matches the correspondingkeyin thedescription_modelsdictionary.
After all
I hope you find this Shortcut useful. You can leave any questions, bugs and suggestions in the comments below.
Comments
Siri and Chat GPT
For those with device supporting Apple intelligence, you can simply ask Siri about what is on the screen. Siri will then ask if you are happy for it to send a screenshot to ChatGPT. You can then ask questions. As usual, any responses from ChatGPT will have a copy button included. The response is always very quick. No need to save a screenshot or remember gestures. Just ask Siri.