Assistants API Cost Exceeds Reasonable Expectations

I began testing an Assistant API based chatbot that references a tabular document in JSON (~3K lines, 8 columns). Instructions document is around 600 words.

Attached is the image displaying two conversation threads, resulting in 1M+ tokens. Is there a more efficient approach to getting Assistants API to reference data in a table or do I just have to reduce the size of the data?

You don’t have direct control over the AI’s techniques for gathering knowledge data from uploaded documents. Here’s some of Assistants’ retrieval tool usage instructions:

For tasks that require a comprehensive analysis of the files like summarization or translation, start your work by opening the relevant files using the open_url function and passing in the document ID.
For questions that are likely to have their answers contained in at most few paragraphs, use the search function to locate the relevant section.

Think carefully about how the information you find relates to the user’s request. Respond as soon as you find information that clearly answers the request. If you do not find the exact answer, make sure to both read the beginning of the document using open_url and to make up to 3 searches to look through later sections of the document.

The AI uses web-browser-like navigation to make continued calls to get sections of documents, and to back out of a document and into another:

back() Returns to the previous page and displays it. Use it to navigate back to search results after clicking into a result.
scroll(amt: int) Scrolls up or down in the open page by the given amount.

Every single continued AI invocation of a tool_call instead of a response to you also carries with it the full context of the thread conversation loaded to max, and the thread filled with every prior tool call result, along with the tokens of instructions and these unseen tool specifications.

Let the AI loose with a 125k context length model it can fill with tokens itself, let OpenAI “delegate” a task with this system programming, and you can make single runs that cost you $5-$10+.

That’s not even considering creative user input that can tell the AI to continue uninterrupted iterations with the user’s own techniques (like you’d exploit ChatGPT Plus with).

Conclusion: this endpoint is expensive.

1 Like

Is there a direct correlation between the size of the file and the tokens used and is there documentation on what that correlation is?

Doesn’t it seem like it’s a bit of a “blank check” to OpenAI if you actually implement an assistant?

Conversely, the price/work of Assistant API is multiples of what I understand to be essentially the same: MS Copilot or OpenAI’s own personal subscription.

The “documentation” is not forthcoming about the nature of how retrieval actually works. Guidance of the typical run steps and how the context is loaded is omitted. “Trust us bro”

It is only by experimentation, and dumping out the internal function text, that one can come up with some answers – like your 800k of experimentation on two chats.

The amount of token data billed depends on the usefulness of the data for answering. If the first document has the answer in the initial text and that document is the one automatically dumped into AI context, then the AI may answer in one turn.

If the document is short and can all fit in context, the worst might be another search upon the same document, because the AI doesn’t know what search will reveal.

If the multiple document file names all sound promising, but the AI explores and explores without finding something to cite, you might be looking at high costs, where it is the AI model context that determines how much per internal iteration can be spent.

1 Like

Would it stand to reason then that one could build the chatbot to default to 3.5 or a cheaper model than 4.5 that would begin the chat to answer general questions, while also instructing the 3.5 in the prompt to return a specific result if it doesn’t have an answer. A result that would then trigger the chatbot script to call the more advanced and expensive model?