Assistants API is Killing Me

You’re not doing anything wrong.

As others have suggested, a less expensive LLM such as 3.5 is an option because you’re relying on RAG and don’t require as much power in the LLM.

IMHO, the most effective option would be to chunk this, import into a Pinecone vector DB (it will be small enough to be run for free) and this will substantially reduce costs while potentially increasing accuracy, depending on how conducive the embedding is to a structured document/chunking.

2 Likes