Models output is not matching training data with 1200+ specific names

simvad · August 14, 2024, 12:13pm

Hi guys

So I’ve been stuck with a very specific problem for a while now.

I’m creating a platform where I use OpenAI’s API to create structured programs for people. For these programs I have a list of around 1200 things to be used in the program. Each thing has to be written exactly as it’s name on my list, by the OpenAI model, without changes.

The problem I’m having is that the model keeps making changes to these names, either forgetting words, swapping words around, and things like that. And when that happens, I’m unable to match the names on my list with the modified names in the models output, which causes a lot of problems.

Since my promt was around 9000 tokens, I thought the reason was because my promt was too long (I also tried to tell the model in various ways to only use the names on the list in my promt). So I decided to fine-tune. I created a fine tuned model with a traning and validation file on around 30 examples and more that 100.000 tokens. And got these decent outputs after several tries: Training loss 0.0946 Validation loss 0.1487 Full validation loss 0.3442.

In the training files I’m very specific about the model only using the correct names, and in the example programs I’ve also used a big portion of the correct names. Both in examples with and without “weight”.

But now when I use the trained model I’m right back where I started. The output model is of course better at some things which is good, but it keeps using whatever names it wants and not the exact names I gave it in the training files.

What can I do guys?

Simon

jr.2509 · August 14, 2024, 12:36pm

Hi and welcome to the Forum!

The experience you are having with the fine-tuned model is due to the fact that during the fine-tuning new information is not retained. It’s therefore not a suitable solution in your case.

Could you share a few more details what specifically it is you are trying to accomplish? That would help to try and point you in the right direction.

simvad · August 14, 2024, 1:00pm

Hi @jr.2509

Thank you!

So I’m using it to create custom workout programs, and need the models outputs exercise names to match the 1200 exercise names on my list, because I need to be able to match the names in th eoutput to illutrations in my database with those specific 1200 names.

And for example an exercise that has the name “Sled Leg Press”, will just be called “Leg Press” or and exercise named “Dumbell Incline Fly” will be called “Incline Dumbell Fly” by the model, so I cant match the output with the correct exercise illustration from my database…

jr.2509 · August 14, 2024, 1:37pm

Thanks for the additional information.

The approach somewhat depends on what exactly your prompt and your desired output looks like for a given API call.

If you need names to be exact, then using phrases like “return the name verbatim” in your instructions frequently helps.

You can also look at post-processing strategies whereby you rely on embeddings-based similarity search to match the model output with the specific exercise programs as opposed to exact matching.

To get more specific guidance, you might want to share an example prompt and model response - it’s quite possible that I am not appreciating the complexity of what you are trying to achieve and that my initial ideas may not be fit for purpose.

simvad · August 15, 2024, 12:00pm

Thank you for your answer:

So I’ve tried to add multiple different sentences in my promt like “return the name verbatim” but without luck. The model completely ignores this.

Post-processing stetegies like similarity search would be hard since a lot of exercises have very similar names and there is so many, so the model might pick something wrong.

Hmm, I’ve tried short promts, and long promts, and everything else works perfectly. The model understands what format the output should look like and it understands that it should build the output based on a list of dynamic user values that is inserted into the promt. But it just can’t get the names right.

Could RAG be a solution? Or could a solution be to retrain the model on new training data where I include all 1200 exercises? But that would take a very long time.

Maybe there is a solution I’m not aware of.

jr.2509 · August 15, 2024, 12:02pm

Could you share an example prompt (it can be redacted as needed)? It’s a bit difficult to give concrete guidance without seeing the structure of your prompt. There could be a lot of factors contributing to the issue

Diet · August 15, 2024, 12:27pm

Have you tried numbering your exercises, giving them IDs?

Then CoT: → describe exercise → get ID (or ID candidates) → get exercise name

If you share your prompt as @jr.2509 suggested, the community can try to give you more concrete advice

simvad · August 15, 2024, 1:10pm

Here is an example of my promt:

{
“model”: “gpt-4o”,
“messages”: [
{
“role”: “system”,
“content”: “You are a personal trainer”
},
{
“role”: “system”,
“content”: “Build a workout program in JSON with the fields X, Exercises, X, X. Structure the workout program like this example: (example)”
},
{
“role”: “user”,
“content”: “(Instructions about how the workout program should be built based on a number of dynamic input fields)”
}
]
}

I’ve removed some personal details in the promt like the fields besides “Exercise”, the example of how the output should be structured, and the instructions based on th eusers inputs. The things I hid are nothing that could interfere with the way the model writes the exercise names, and the exercise names in the example structure is also written correctly.

In the model I’m also swapping gpt-4o with my trained model’s name, but neither of the base model or my trained model gives me the correct output with the exercise names.

Before I trained the model I had this part in the promt as well, before the structure instruction: “Use these exercises and their exact name without changes: Air Bike, Alternate Heel Touchers, Alternate Lateral Pulldown…(1200 exercises here)”

I’ve tried a variety of ways to tell it to use the correct names without changes, but noting helps.

Now I have the full list in the training data, and have also tried in my promt to refer to the exercises in the training data, or the list that the model already knows, but also without luck.

I’ve also just now, tried to insert all the names in to make the model see each full name as it’s own thing, both in the promt and in the training file when I retrained the model, but that doesn’t work.

simvad · August 15, 2024, 1:38pm

Honestly It feels like fine-tuning is not even working. If I remove my example of how the workout program should be structured from my promt, it will give me the output in the wrong format, even though I gave it 30-40 examples with the correct structure in the training files.

jr.2509 · August 15, 2024, 2:05pm

No full guarantee but you could try a version of the system prompt as follows (note that there are some gaps that need to be filled):

You are a personal trainer, responsible to create personalized workout programs tailored to individual user needs. For the creation of the program, you are provided with a set of inputs: [include a list and description of the inputs here - this should also cover an overview of the dynamic user inputs]. Your response consists of a JSON with the fields [replace with details on the JSON schema]. In the JSON you must strictly reference the exercises from the list provided by their ID and full name (verbatim).

Append to this instruction the list of exercises. As suggested, given the number of exercises there is merit in adding an ID for each in addition to the full name.

Is there are any way you can filter out the relevant exercises based on the user’s input as to reduce the list of exercises to be appended to the system prompt for a given user query?

To conclude, I don’t think you need fine-tuning for this use case.

You can control the output structure by adding an output example as you currently do or by taking advantage of the new structured outputs feature.

As for the naming conventions, the model likely struggles due to the sheer number of exercises. If you can, creating a dynamic system prompt that only includes the most relevant exercises based on the user input, could help to overcome this.

stevenic · August 15, 2024, 4:03pm

Can I ask are you just giving the model a list of 1, 200 names and asking it to generate 1,200 workout programs? That’s way too many things to ask the model to do at once. You’ll get way better results if you make 1,200 calls and separately pass in each name.

jab.r · August 15, 2024, 4:43pm

Consider using structured outputs with JSON schemas for exact outputs, that said 1200 things may be too many and you may need to break up the problem into something simpler.

stevenic · August 15, 2024, 5:11pm

As a general rule… the longer the output the less predictable the output is going to be. These models are always looking for ways to compress their output so if the output starts getting too long the models going to look for ways it can compress it. Dropping words, rephrasing things, etc.

simvad · August 16, 2024, 8:57am

I don’t think it’s necessary to include all that information in the system promt. Right now the model understands everything in the promt perfectly the way it is, except the exercise names.

About adding ID’s for each exercise name, wouldnt that just make the promt even longer and thereby even harder for the model to read?

But the fine tuning I’ve made undtil now wouldn’t hurt to use right?

I think the only option I would have to minimize the number of exercises and filter them out would be to maybe remove 200 of the 1200, and then take the 1000 and divide it into a 800/200 ratio, so maybe the 800 ration would still be too long. I could of course try to manually edit all names and make them shorter, but right now they have the correct name and calling it something else could stand out as unprofessional.

What about something like RAG? I’m not sure I understand it fully, but wouldn’t that allow me to reference a list of the exercises in my promt and the model would read that before generating an output?

simvad · August 16, 2024, 8:58am

No. It’s just reading the 1200 names for each promt and generating 1 program.

simvad · August 16, 2024, 8:58am

I already am using structured JSON outputs, that works, but doesn’t solve the incorrect name issue.

stevenic · August 16, 2024, 10:01am

RAG isn’t going to help you. RAG is about determining what you want to show the model and you already know what you want to show the model… 1200 names. The issue you’re running into is the models natural desire to compress its answer. I could argue that every answer generated by the model is just an exercise in compression.

I’m still not fully visualizing the task that you’re trying to perform. Could you show a short example, using 2 or 3 of the exercise names, the pattern you’re expecting out?

simvad · August 16, 2024, 10:47am

Okay thank you, then RAG is out of the way.

I can give you a few examples here. So lets say i generate a program with a bunch of exercises. Then there is always between 5-25% of the exercises than are named incorrectly compared to the names on my list. In rare cases They are all named correctly.

Fx
“Sled Leg Press” will be turned into “Leg Press” (Missing a word)
“Dumbell Incline Fly” will be called “Incline Dumbell Fly” (words swapping around)
The model can also write an exercise named “Barbell Stiff Leg Deadlift” which doesn’t even exist on my list, but I do have similar exercises like “Barbell Straight Leg Deadlift” and “Dumbbell Stiff Leg Deadlift”

simvad · August 16, 2024, 12:05pm

The post was just hidden by the automated spam filter but is up again now.

kwhinnery · August 16, 2024, 1:03pm

Confirming you have tried the recently released Structured Outputs feature (https://platform.openai.com/docs/guides/structured-outputs/structured-outputs-vs-json-mode) and not the older JSON mode?

If you are using Structured Outputs and your schema validation isn’t working, maybe try sharing that? In theory, you should be able to define a validation schema that uses an enumeration with your specific exercise names.

Topic		Replies	Views
How to confirm that you got the correct value from a text other than repeating the same prompt over and over API	39	874	September 1, 2024
Structured Outputs not reliable with GPT-4o-mini and GPT-4o API structured-output	38	7284	January 23, 2025
Unexpected output from a basic prompt Prompting api	6	190	October 12, 2024
Best Practices for Handling Long Enum Lists in Function Calls API fine-tuning , api	13	3497	February 16, 2024
Seeking advice on Assistant API 2.0 Beta: RAG vs Fine-tuning for Structured Output API api , lost-user , assistants-api	8	269	January 15, 2025

Models output is not matching training data with 1200+ specific names

Related topics