Using the model with a pre-defined set of answers

Hello everyone! Have been blown away in the playground but looking for some fine tune guidance.

Can i use embeddings or fine tuning to give a language model a specific set of answers?

Here is a sample case to illustrate what I’d like to do in categorization/classification:

Let’s say I want Open AI to tell me the color of an object. The prompt would be something like “What is the pantone color of the sky?” There are 1826 pantone colors.

What if I wanted my answer to be within only 400 of these pantone colors and the model to return the closest color label and NOT give any of the other 1426.

Can this be achieved? And what general steps would need to be taken to complete this?

For the record, I’m not trying to classify colors, but I figured this would illustrate well what I’m trying to achieve.

Hi @colin.rathbun

I think there is 2 ways on approaching this problem.

  1. Fine-Tuning
    You can try the fine-tuning API, it will probably give you what you are looking for, but depending on your context, it will always be better and never 100%

  2. Semantic Search + Prompt Engineering before the API request
    You can use Embeddings API to detect the user’s question and submit your ideal answer in the prompt. For example you can do the following…

If there user ask about Pantone Colors, tell them that we only have the following 400 colors and see our link at https://....

Q: What kind of pantone colors do you have?
A:

For instruction on how to do semantic search, you can refer to this.

1 Like

Thanks @nelson.

  1. For the fine tuning, do I just give 400 examples each with one of the 400 pantone colors? Maybe more?

  2. For the semantic search, that prompt does not ask or get the answer I’m looking for (unless I am missing something). I’m looking for a single pantone color response… Only one of the codes I have provided…

@colin.rathbun

  1. It really depends on your use cases, I suspect you are going to have more than 400 records, see this example here…
{"prompt": "What are your most popular yellow paints for bathroom", "completion": "Our most popular paints are Yellow, Pantone 101 and Pantone 102"}
{"prompt": "What is your top selling paint this year.", "completion": "Our top seller is Pantone 101"}
  1. The semantic search is done using the Embeddings API, there are going to be some programming involved. But the idea is there you can store and search documents very quickly and feed them into the prompt along with the actual question. It’s bit more complicated than the fine tuning API, but if gives you a lot more flexibility.

Sounds like a very interesting project, good luck, let me know if you run into any other roadblocks.