How "force" the API to really follow all the instructions?

I’m using the API (now gpt-3.5-turbo, before davinci) and sometimes I get good results, but other times it ignores my instructions or even do the complete opposite. Even with easy stuff like “give the results in Portuguese” (sometimes it gives me the results in English) or “show all the results in uppercase” (gives me other formats).

I’ve tried different prompts, using the system message or the user message (with gpt-3.5-turbo), giving long or short prompts…
I’ve also tried 2 different models (gpt-3.5-turbo and davinci). Davinci seems a bit less stubborn but it’s old and going to be deprecated.

Any ideas or hacks to make the AI really do what I tell it to do?

Maybe prompt engineering, changing the API configuration/parameters, using a different model/API or something else?

I can’t use this API if it doesn’t follow my instructions (at least the easy ones).

1 Like

Hi and welcome to the developer forum!

Can you post some prompts (with translations if required) along with outputs generated by the model that deonstrate the problem?

1 Like

The chat AI will be predisposed by the first language it sees, to continue in that language and will make it hard to break away from its decision, so if you can write a very articulate system programming and user input prompt in the language in which it will be operating, it will function that much better.

“All Uppercase” is actually a challenging task, because that is not how the language the AI has been trained on works. If that is a regular part of your text processing, you might submit a second request that uses AI to modify the high-quality natural language (if it is not simply something you can do with a little subroutine).

2 Likes

That is my exact problem. GPT-3.5 just does not reliably follow the instructions no matter whereI place them. It makes it almost useless. I would rather pay for expensive GPT-4 API but it is not available.

The models are not instruction following models, they are chat models.

Use davinci or wait for the new instruct model to be released.

These models are not general purpose computers. They are not algorithm followers. They are stochastic parrots, more like a Markov chain or a Mad Libs generator, than like a scripting language.

The way the models work are, approximately:
“Assume that the following text is found at the beginning of a web page on the Internet. What would be the next text to follow?”
When training, the model gets reinforced when it predicts close to what actually follows, and gets suppressed where it doesn’t.

If you read a random post on Facebook, it won’t necessarily follow all the request or instructions in the first post. Neither will this model, because that’s note a prediction of typical human text.

1 Like

Thank you for the hint, but I need at least 8K context for every request. Speaking of chat models, Claude does much better job following the instruction in the chat window than GPT-3.5.
I think my instructions are quite reasonable and GPT-4 is following them correctly at least in chatGPT.

Question to the OP: how are you bolting together your implementation? I find it a little curious that you’re experiencing such wide variance from your prompting. In my experience you can get fairly consistent responses provided your prompt is:

  1. Concise
  2. Precise
  3. Always top-of-mind to the model.

are you using something like LangChain? Because there are some pitfalls to avoid where the abstraction may have you unwittingly kicking your prompt down a well.

2 Likes

I had success with adding at the very bottom of the system prompt “Output in ${language}” then of course choose the language you want dynamically.

Then in the user prompt, finish it by adding this line “Output:”

Please note that GPT3.5 is not part of the InstructGPTs models. OpenAI doesn’t make an explicit mention of what these models are, but it points they follow complex instructions much better. GPT4 for example. Here you go => Aligning language models to follow instructions

@elmstedt GPT4 is an instruct model or chat?

@b0rked_rebase Not using LangChain or anything fancy, just API calls using Python.

Not sure how can I make phrases like "give the results in Portuguese” more concise and precise…

Also, not sure what you mean exactly by “Always top-of-mind to the model”. Can you please clarify and/or provide an example?

Thanks!

@yassinerajallah Thanks, I will try the “output” solution.

A bit weird that they give data about “InstructGPT” models, but do not specify what those models are. GPT4 is Instruct and GPT3.5 not?

Do the API configuration and parameters (temperature, frequency_penalty, etc.) have some impact on the ability to follow instructions precisely?

Or it’s all just about the prompts and the model you choose?

Yep try it, make sure you drop the temperature and top_p to 0 to get a deterministic output. You can increase them bit by bit if you want more diversified output.

Regarding the instruct model, no one knows up to my knowledge. From experience, gpt4 performs much better than 3 in following my complex instructions. Usually when I find myself needing GPT-3 with sth advanced, I use a chain of prompts

1 Like

Essentially, when you set a system prompt for a conversational model like GPT-3.5 or GPT-4, as the conversation evolves, the model will ‘lose’ context or adapt to the conversational history, the best way to get consistent response is to feed back the system prompt with each call.

Thanks for the tip, but i’m not really using it as a conversational API, I only send 1 message/prompt per topic/task.

Have you tried few shot examples? Leading by example in my experience has much better outcomes than explicit instructions.

Also, if you want something in a different language you may be better off letting GPT finish the prompt as it wants, and then perform the translation task afterwards. Certainly helps with quality.

I can’t imagine that immediate jumps in language helps the quality.

Jamming / bunching instructions that don’t relate to each other usually doesn’t work out as well as separating them.

Chat. It uses the /chat/completions/ endpoint.

1 Like

@RonaldGRuckus Mine it’s not a chat app, I just provide 1 message to the API (including some user’s input) and I need it to follow the instructions on the first try.
Not sure if I should use Davinci (old but instruction-oriented apparently), the new chat-oriented models (GPT3.5-4) or just forget about OpenAI and look for different models/companies.

@Foxalabs Sure, here is my current prompt:

You are a marketing professional. Write ad copy based on this context: "{context}
Create the ads in {language} (don’t use other languages).
Use this writing style: {tone}.
Provide 10 headlines (30 characters maximum) and 8 descriptions (90 characters maximum, 50 minimum) that are very different from each other.
Don’t use exclamation marks, question marks, or other symbols.
Give the response as HTML. “HEADLINES” and “DESCRIPTIONS” as headings (H2) and the lists as a numbered list (ol).
Don’t include explanations or anything else, only the headings and lists.
Follow these instructions precisely and really count the characters

The things between brackets are user’s inputs (they fill or select things in a form).

With this prompt and gpt-3.5-turbo it always follows the HTML format (surprisingly, I thought that was the difficult part), but it doesn’t really count the characters, it doesn’t always apply the language and it never applies the format selected (uppercase, capitalized case…) to all the texts (sometimes it does for one part) even if I give examples and explain how that format works exactly.

Now I’m thinking: Since the response that I want is HTML code, maybe there is a better code-oriented solution for this (by OpenAI or other companies)?
The difficult part here is not the code, but the text inside it, so not sure if it’s better to use text-oriented or code-oriented solutions.