How to ensure my agent only returns a single letter code?

I am frustrated with my assistant on GPT-3.5-Turbo. I attached a 2 file JSON field: 1) Questions 2) Category Code. When a message is submitted, I want the assistant to return a code from this list {A, B, C, D, E, G, H}, with H reserved for nothing found. But instead, I always get a long sentence reply, and then with the letter code, which is usually correct. It works better in GPT4, but the costs are much higher for what is a simple lookup.

Is there a way for me to give better instructions to the assistant? I would even be happy if the long answer was outputted, provided that the code is encapsulated with {A}, instead of “A”. Do you think the instructions can be improved?

  1. Purpose: I am a dispatcher assistant using the data in the uploaded file “[t2b.json]” to categorize user prompts.

  2. The Data: The file contains two fields: “Prompts” and “Category Code”. The Category Code is a single letter from the set {A, B, C, D, E, F, G} while question are typical questions from users.

  3. Define the Task: My task is to match the user’s question to the corresponding “Prompts” field in the file. If a match is found, I should return only the single-letter category code from that entry.

  4. Handle No Match: If no exact match is found, I should return the letter “H” to indicate “Nothing Found”.

  5. Response Format: Please ensure my responses are limited to single letters from the set {A, B, C, D, E, F, G, H} and avoid any additional text or prompts. Enclose the Category Code in brackets {}. e.g., {A}, {B}, {C}, {D}, {E}. {F}. {G}, {H}

  6. Example Usage:

Prompt: What training and onboarding support does Acme Company offer to new subscribers? Response: {E}

Prompt: Tell me a joke. Response: {H}

1 Like

Might be a good case for a smaller fine-tuned model. You’d need to create a dataset, but it would likely not need many examples…

Before you go that far, maybe try giving it a one-shot or even two-shot example in the system prompt OR as a user/assistant pair.

1 Like

You could potentially save a lot of money by converting this into local code that used embeddings to match prompts.

Embeddings would only need to be retrieved for a library of “examples”, each of which could be pre-classified with your codes. Then you would simply compare your runtime query vector with these examples via vector search and return the associated classification of the closest example.

5 Likes

I think you are correct. I am going to look into, embeddings. So let me make sure I understand.

  1. I have a spreadsheet of 150 questions, categorized into 7: A, B, C, D, E, F, and G. I want to create a triage fine tuned assistant, which will return one of these and only one {A-H}, with H being the catch-all, I don’t know.
  2. I tried uploading a JSONL file, but I get the error that it is not in the Prompt/Completion pair format, But it is. Here are 3 rows:

{“Prompt”:“Compare deforestation trends over the past decade.”,“Completion”:“The answer is {A}”}
{“Prompt”:“Explore flexible pricing models for emerging businesses.”,“Completion”:“The answer is {B}”}
{“Prompt”:“Showcase success stories from businesses that have subscribed to GreenAnt.”,“Completion”:“The answer is {C}”}

Ideally, I just want the single letter return, but someone said the completion field was too short, so I encapsulated it and planned to extract it in Python afterwards.
I would like to get it to work ASAP, and then explore word embedding. But I think I better start looking into it now. Thanks.

Yes, I created a data set but regardless of size, the fine tuning or storage will not accept my JSONL file. This is the error I get followed by sample data:

“There was an error uploading the file: Unexpected file format, expected either prompt/completion pairs or chat messages.”
{“Prompt”:“Compare deforestation trends over the past decade.”,“Completion”:“The answer is {A}”}
{“Prompt”:“Explore flexible pricing models for emerging businesses.”,“Completion”:“The answer is {B}”}
{“Prompt”:“Showcase success stories from businesses that have subscribed to GreenAnt.”,“Completion”:“The answer is {C}”}

To achieve this you do not need to call the assistant or the chat completion endpoint, only the embedding endpoint to get the query vector.

This assumes you’ve pre-seeded each example you have with an embedding vector you can match against.

The embeddings API is all you may need for such a classification system and the bonus is it’s very fast and much cheaper (especially when compared to the Assistant API!)

1 Like

The failure is in the style of prompting, how the AI is addressed.

Here’s a playground preset that even gpt-3.5-turbo-now as a low bar can complete.

https://platform.openai.com/playground/p/MFEIneM2cf4pwHtoeJrwaip5?model=gpt-3.5-turbo&mode=chat

You can leave in a few successful user/assistant messages before your ultimate question as examples.

You didn’t provide what these prompt categories could be, so I gave some examples. Another set of categories:

System message

You are a user input categorizer, a backend processor that decides which of eight specially trained AI agent categories a user input prompt is sent to:

// Categories and descriptions

categories = {
“A”: “Data Retrieval”,
“B”: “Process Optimization”,
“C”: “Skill Development”,
“D”: “Entertainment Curation”,
“E”: “Health Assessment”,
“F”: “Economic Forecasting”,
“G”: “Interpersonal Communication”
}

// Output format

Output format must be only valid JSON, with the key “category”, and the value taken from keys of “categories” above.

// Example output

{“category”: “H”}

1 Like

Robert. Thank so much. I think I got it working on python thanks to some chatgpt generated code. I owe you a big one. My internet is down, but I’ll send you a bettet thank you later.

1 Like

This was the perfect solution. I am using “all-mpnet-base-v2” sentence transformer, it was easy to code against and it works fast and cheap. Thank you so much!

2 Likes

Thanks for coming back to let us know. May your thread help someone in the future…

Hope you stick around!

1 Like

I really appreciate everyone’s input and support. I hope that someday I will be able to return the favour and contribute.

1 Like