How should I convert images into text before fine-tuning?

Hi there!

I want to fine-tune the model for generating user guides according to UI mockups. The scenario is as follows:

  1. GPT takes in an UI mockup with the question “Write a guide on how to use this mockup”
  2. Afterwards, it generates a detailed user guide (f.e., 1. Click the Open button. 2. Enter your name in the Name field etc.)

As far as I understand, first of all, for this I need to convert the UI mockups into text. Tell me, please, how can I do this? What would you recommend?

Thank you in advance for any guidance!

No vision model can be fine tuned in general release. Only GPT-3.5-turbo, which does not have image input.

The only way to take in an image currently on OpenAI would be if you were to use GPT-4-Turbo vision to perform some GUI to text representation for every user request. If so, it is far more likely that you’d just want to use the GPT-4 model instead of sending its text to a fine-tuned 3.5.


Then: come up with a plan where you can be competitive with the free version of ChatGPT:

continued output from free ChatGPT...

Only missed stuff like message editing and deleting…

How to Use the Application

Starting a Conversation
    Type your question or command in the input field at the bottom of the chat window.
    Press "send + clear" to submit your input.
    The AI assistant will respond in the chat window.

Model Selection
    Click on the dropdown next to "models" to select the desired AI model.
    Different models may offer varying levels of responses and capabilities.

Clearing the Chat
    To clear the current conversation and start fresh, click the "clear hist" button.
    This action will remove all previous messages from the chat window.

Reading System Instructions
    Pay attention to any system messages at the top of the chat window.
    These messages provide context for how the AI is configured for the current session.

Checking Status
    Monitor the status bar at the bottom of the window for information on the application's state.
    Ready status indicates the AI is prepared to handle new inputs.