I want to create an assistant that acts as an annotator for a specific dataset. I have a document with around 3.5K words (~5K tokens) with very specific instructions on how to annotate the dataset.
I create the assistant using the ‘instructions’ field with a text-plain version of those guidelines. Then, I want to run N times the assistant by only sending the specific document that it must annotate (which, on average, is around ~60 tokens).
The thing is that, the first call to the assistant on the same thread seems to be consuming the tokens for the guidelines - which I understand. But then, every subsequent call, the API usage metadata seems to be accumulating all previous messages. This means that every request is consuming as input token the guidelines and ALL previous reviews.
What is the correct approach to do this? How can these instructions be computed only once, and then, focus only on consuming tokens for each new document?
Hi, thank you for your response! The issue here is that the guidelines are consumed for each request. If I understood correctly, these are system prompts, and therefore, they should not be computed as token usage at each request. But they are.
How can I make the assistant aware of the guidelines, without the need of being consumed at each request?