Image labelling with image-based instructions

francesco3 · September 3, 2024, 9:10am

I would like to use the API to automatically label my image dataset. I tried with the chat completion API using several prompts but the task is pretty difficult and it would be useful to show some known examples before the image to be annotated. Is is possible to create an assistant that takes a subset of pre-annotated images and labels new ones? If yes, how?

_j · September 3, 2024, 9:58am

The technique you would want to try is multi-shot learning, using chat completions where you are the one in control.

Besides the system prompt message, you simulate prior user inputs and the desired response for several chat message turns with images before the final one the AI will respond to. That will leave the AI predisposed to follow patterns and techniques it has just observed.

francesco3 · September 3, 2024, 10:02am

So the endpoint that I should use is chat completion, right?

_j · September 3, 2024, 10:14am

Yes, you can independently make calls that have the required construction, whereas assistants is designed to maintain a user chat session.

Briefly, multishot would look like this shorthand, where you are providing each example image in the API’s chat message data format:

system: You are a computer vision image analyst. Follow these rules: {blah blah}
user: label this image {image_example1}
assistant: Circuit board, minor scratches, missing component, good condition.
user: label this image {image_example2}
assistant: Broken gear, worn teeth, metal fragments, requires replacement.
user: label this image {image_example3}
assistant: Control panel, cracked screen, intact buttons, slight discoloration.
user: label this image {image_example4}
assistant: Hydraulic pump, oil leak, rusted bolts, operational status unknown.
user: label this image {image_example5}
assistant: Conveyor belt, torn section, frayed edges, motor functional.
user: label this image {image_under_evaluation}

Examples can lessen the prompting work you must do, although the in-context “learning” is of less quality on new overfitted chat models that apply most attention on a new question.

francesco3 · September 3, 2024, 10:18am

Thanks, I will try that! the only drawback that I see is that for each image under evaluation I must provide all this context, am I right? This means a lot of tokens for a single labeling.

_j · September 3, 2024, 10:33am

You can set quality:low for some or all images. That costs under 100 tokens per example (plus what you want to demonstrate as response), and they are encoded from a size under 512x512 then.

You cannot use images in a system prompt to give your examples there, and you cannot fine-tune an OpenAI model with images, so if a picture speaks 1000 words of prompt, this is the method left for you.

Topic		Replies	Views
Few shots with multiple images API api , lost-user	1	293	January 28, 2025
Using Assistants for text parsing API assistants-api	2	337	June 4, 2024
Gpt-4 vision few shot prompting with images API	3	3979	May 29, 2024
How to efficiently include image inputs in a multi-turn chat? API multimodal , assistants-api	4	110	June 24, 2025
I want to create few shot example with Assistants API API api	7	1798	March 3, 2024

Image labelling with image-based instructions

Related topics