createChatCompletion - Reduce consumption of tokens in instructions

I am testing the OPEN AI API using the gpt-3.5-turbo model. As far as I understand, all requests must include instructions, such as “You are an AI that will generate random questions about X topic, and if there are inappropriate words, take X action, etc.” This is what consumes the most tokens in the long run. Is there a way to prevent these instructions from consuming tokens since they will never change?

My code:

const openai = new OpenAIApi(configuration);
  const messages = [
        {
          role: ChatCompletionRequestMessageRoleEnum.System,
          content: AIInstructions // ---> Instructions send each time
        }
    ];
1 Like

Source: Response from GPT-4
It’s true that instructions consume tokens, and repeating the same instructions in each API call can be inefficient. However, GPT-3.5-turbo requires a system message to set the behavior. You can minimize token usage with the following suggestions:

Suggestions

  1. Make the instruction as concise as possible without losing important context.
  2. If a conversation involves multiple messages, consider using a single API call for the entire conversation instead of making separate calls for each message.

Inspirational Questions

  1. How can we optimize the conversation structure to make the most of the token limit without sacrificing context or quality of the AI’s responses?
  2. Are there any other aspects of the API call that can be optimized to further reduce token usage?

Code Modification

To make your code more token-efficient, try using a concise instruction in your system message:

javascriptCopy code

const openai = new OpenAIApi(configuration);
const messages = [
  {
    role: ChatCompletionRequestMessageRoleEnum.System,
    content: "Generate random questions about X topic, and avoid inappropriate words." // Concise instruction
  }
];

Remember that making the instruction too brief may result in less context for the AI, which could affect response quality. It’s essential to find a balance between token usage and providing enough context for accurate and relevant responses.