Hello,
I am currently utilizing the OpenAI API for sending messages to assistants and have encountered an issue with token usage. Each call I make seems to consume approximately 4000 tokens, which is perplexing considering my messages are about 50 tokens and the responses are typically around 200 tokens. I have lengthy instructions set up for the assistant, and I am wondering if this is impacting the token consumption.
To give you a better understanding, here’s the process I follow with my code:
- Check for an existing thread ID to do the job; if not present, create a new thread.
- I determine the assistant ID with an static id in my code.
- I generate the input with around 200-400 tokens.
- Add a message to the thread.
- Execute the thread (Run command).
- Check the status of the run.
- Retrieve the steps of the run.
- Finally, when is “completed”, I obtain the thread messages to view the response, ensuring to handle cases where the response might not be immediately available or if there’s an error.
My specific question is: How can I optimize token usage when interacting with OpenAI’s API, especially considering the length of my instructions? I tried to put fewer token in the instructions and I don´t get the same responses, but they bill less tokens. Is there a way to prevent these long instructions from increasing the token cost per call? I would like to understand if the instructions are being billed in every call, contributing to high token usage and how to avoid to be billed in every call for the training.
Any advice or shared experiences, particularly from those who have dealt with similar situations or have in-depth knowledge of OpenAI’s token billing system, would be greatly appreciated.
Thank you in advance for your assistance.