How should I convert images into text before fine-tuning?

wassabi93 · July 2, 2024, 6:04am

Hi there!

I want to fine-tune the model for generating user guides according to UI mockups. The scenario is as follows:

GPT takes in an UI mockup with the question “Write a guide on how to use this mockup”
Afterwards, it generates a detailed user guide (f.e., 1. Click the Open button. 2. Enter your name in the Name field etc.)

As far as I understand, first of all, for this I need to convert the UI mockups into text. Tell me, please, how can I do this? What would you recommend?

Thank you in advance for any guidance!

_j · July 2, 2024, 6:56am

No vision model can be fine tuned in general release. Only GPT-3.5-turbo, which does not have image input.

The only way to take in an image currently on OpenAI would be if you were to use GPT-4-Turbo vision to perform some GUI to text representation for every user request. If so, it is far more likely that you’d just want to use the GPT-4 model instead of sending its text to a fine-tuned 3.5.

Then: come up with a plan where you can be competitive with the free version of ChatGPT:

continued output from free ChatGPT...

Only missed stuff like message editing and deleting…

How to Use the Application

Starting a Conversation
    Type your question or command in the input field at the bottom of the chat window.
    Press "send + clear" to submit your input.
    The AI assistant will respond in the chat window.

Model Selection
    Click on the dropdown next to "models" to select the desired AI model.
    Different models may offer varying levels of responses and capabilities.

Clearing the Chat
    To clear the current conversation and start fresh, click the "clear hist" button.
    This action will remove all previous messages from the chat window.

Reading System Instructions
    Pay attention to any system messages at the top of the chat window.
    These messages provide context for how the AI is configured for the current session.

Checking Status
    Monitor the status bar at the bottom of the window for information on the application's state.
    Ready status indicates the AI is prepared to handle new inputs.

Topic		Replies	Views
Make OpenAI Vision API Match GPT4 Vision API chatgpt	4	3831	December 6, 2023
Few shots with multiple images API api , lost-user	1	231	January 28, 2025
How to fine-tune ChatGPT for design comparison? API fine-tuning	0	63	October 15, 2024
Question on Finetuning: Can you hardcode images or upload image responses via the image_url subkey of content? API	3	198	August 27, 2024
Customer service assistant inferred from screenshots Community chatgpt	3	569	March 20, 2024

How should I convert images into text before fine-tuning?

Related topics