I’m using assistant api for classifying playstore reviews. I’ve give some instructions to Assistant so that it will classify the reviews as per the instructions. When I run in iteration like say if there are 500 rows of reviews and if I run it in 10 batches of 50 per batch, will the system prompt (instruction) l get counted for every batch as tokens (generally the credits)?
Yes your system prompt will be included for token calculation each time you run for inference.
Assistants is meant more for continuing a chat session than for batching input/output operations, with its thread memory of a conversation. If you keep adding more messages to the same thread, the costs will inflate even more than just repeating the instructions again.
If you are using it solely because it has a document extractor, then be aware that you are doubling your costs or more, because the entire conversation and system prompt is sent once to produce an AI that attempts to search a vector store made of document chunks on the first run, and then internally resubmits the conversation with the added document chunk response for another model run autonomously, continuing until the AI decides internal operations do not need continued invocation and finally writes a response to you.
The chat completion API format is more efficient and performative, and can also be actually batched with OpenAI’s 24-hour return window multi-job batch API for 50% savings.