Issues with High Token Usage in Assistants API for Chatbot Responses

I have found a solution for creating a chatbot that can answer customer questions on behalf of our company. We have many questions and answers related to a ski resort. However, there’s one issue and I need some help.

I’m using the Assistants API and have uploaded a file with questions and answers. When I ask these questions in various forms, the responses are good. But I noticed that in the Playground, after generating a response, it uses approximately 10,000 tokens.

Could you please advise if I am doing something incorrectly and provide some tips on how to optimize this?

1 Like

Hey, I saw your post and just wanted to drop a quick thought here – maybe it’ll help.

You’re definitely not doing anything “wrong” per se – the system just tends to load a lot when you work with large files + the retrieval tool. If you’re uploading a full Q&A dataset, every response might be referencing the whole thing. That’s what eats up tokens fast.

What worked for me was breaking the data down into smaller topic-specific files (e.g., “Opening Hours,” “Parking Info”), and only referencing what’s needed at the moment.

Also worth looking at is limiting how much previous context you carry forward. You don’t always need the full history – just enough for the assistant to stay coherent.

I’m building a more modular chatbot system myself (nothing fancy, just structure-focused), so if you’re curious to compare notes, happy to share.
Either way – your project sounds like it’s already got good bones.

Keep it up!