How can I restrict Chatgpt to choose image attributes from a list of values?

I have a use case where I need to identify color of image from a preset values. I want the model to respond from within the options provided. For example, I want to restrict the response to Pink and not receive Fuchsia or Rose Gold as response.

By prompting the model to restrict color value to pink, I’m unable to achieve this.

Using logit_bias might also not work as setting pink, red, blue all as 100 would lead to hallucination.

I need 1-2 word reply from within the allowed values. How do I achieve this?

It turns out my AI prompt for writing API tools also will give a schema for you to provide the AI as its own output format.

Response: This reponse JSON schema must be followed precisely as your only output.
- start with {"color":
- only enum listed are allowed
- no markdown container allowed (``` prohibited)

```json schema for response output
{
  "$schema": "http://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "properties": {
    "color": {
      "type": "string",
      "enum": ["red", "orange", "yellow", "green", "blue", "indigo", "violet", "black"],
      "description": "The color to be emitted by the tool"
    }
  },
  "required": ["color"]
}

If the colors are all 1 token, you can set max_tokens to right where color is produced and nothing else. On a pure completion model, or an AI like Anthropic’s Claude, you can write the JSON all the way up to the color value and just get the one token. Even the logprob certainty may be useful

1 Like

Thanks!!

Let me try this out

Does it work 100% of the time for you?

I do not create AI that writes only colors.

However, providing the JSON schema as the response format for output, telling the AI it IS a JSON schema, the AI only emits output direct to an API, and the response is validated against that schema, and any deviation from JSON specification in the output will cause an error, for which the AI will be held accountable, is a strong technique to format responses and have compliance with a list of possibilities, without needing any “mode”.

The AI still is a token predictor. If you provide a user image, but say “these color names are banned from your output”, the JSON might not be followed.

Why wouldn’t you do something more programmatic?

Just extract the RGB values for each pixel, assign them a color label based on nearness to the coordinates of the colors you’re restricting the model to, then simply report the most common color?

1 Like

The image has background also. I need to extract the color of object. It will become more complex implementation.

Exactly this.

I would not use AI where existing traditional methods are more reliable and deterministic.

Use existing libraries for this.

I use ImageMagick in combo with RMagic and Prizm., but I’m sure there are other options.

e.g. here I’m getting the quantised palette and getting the “dominant” colour.

1 Like

I’m working on gemstone images. Identification and classification of color from allowed images is one of the requirements. Also, need to classify shape from a limited set of values. Cut also needs to be classified similarly. Hence, looking for a scalable solution for this classification problem.

@_j tried enum solution. Success rate increased to about 75% from 40% but still not entirely accurate across multiple attributes.

1 Like

Ah, now I see the challenge more clearly.

You have even bigger challenges potentially then?

For example can you guarantee lighting is identical between images?

If that’s the case I still think deterministic quantisation is worth a try.

That would be difficult as lighting would be different as some shots are indoors while others outdoor.

1 Like

Very challenging setup then. I see why you are trying to explore AI.

I think the problem you have is one of not mere color or obeying rules, but more how vision actually works.

ColorScaleMaster-stacked_edit

Ask what color something is, you get an AI that’s been trained on red balloons, green clovers, yellow moons, using tagged data. It is unlikely that annotators have giving anything but “sapphire” or “clear” to gems.

With uneven photographic lighting, or unclear color temperature, you still have a problem in identification, which can be as bad as “the dress”. That, despite were you to first give the AI thinking space about colors, interpolating between natural language of colors, reasoning out the result to produce, or other techniques.

Computer vision can kick bad apples or eggs with double yolks or meat spots out of the production line at high speed. However, these are not just vision trained on a single task, they are also are operated in a controlled environment.


For more general advice, if color results in “yellow” instead of “N”, it may be that N was still in the possibilities of tokens the AI would have produced. With all the training you might do, such as giving example gradings in chat history before the ultimate question, the desired results still might not be top rank over “beige”. However, they could be in logprobs - the top 20 results the AI can output, accompanied by the certainty level.

If for a 1-token position you extract the logprobs, you might be able to discard invalid tokens and get the remaining set. Then, if you are indeed grading on a scale [D, E, F, G…], you can pull out all the available logprobs, and apply that weighting. For example G=20%, D=10%, F=10%, E=5%, you could figure out what the AI is considering, by placing those points with their weight into an average. This “multiple instead of top” also needs to be accompanied by re-weight of the total answer, because top real D will still have the other adjacent possibilities pulling down the score somewhat.

Are the remainders missclassifications with regards to the actual color of the objects or is the model not using the enum as instructed?

What is a acceptable threshold for your solution?

Are there certain characteristics regarding the failure cases that could be used in a second step to either identify missclassifications automatically or maybe correct them immediately?

Are you performing preprocessing in any way, shape or form?

lots of options listed here The best library for structured LLM output – Paul Simmering

Not sure I’d recommend this … this is very much a legacy API, and in fact I’m not even certain you can call text-davinci-003 anymore?

Well there’s your answer! :smiley:

Be careful of using ChatGPT to make recommendations, it tends to hallucinate and use out of date information!

Just copying the output to here without checking is not good practice, you are going to confuse someone or waste their time.

1 Like

“Be careful of using ChatGPT to make recommendations, it tends to hallucinate and use out of date information!”
Yes that was what i hoped would be noticed. Along with the failings of the product.

“Just copying the output to here without checking is not good practice, you are going to confuse someone or waste their time.”
Indeed! you are completely right, as not everyone is going to see the superfluous-ness of the solution given by GPT, which i neglected to consider. Point taken will take that into consideration in future!

1 Like

So for us, acceptable threshold is about 95%

Main concern is that model is not using enum as instructed. For example, while enum has Pink as an option but no Rose Gold, GPT still outputs Rose Gold.

Like I said, since there are multiple such attributes, having a secondary step is not scalable, especially since I don’t know what all possible values GPT might respond with.

I’m not performing preprocessing in any way.

Thank you for sharing this. Let me go through it and see if this can help in any way.