I am doing a project for object categorisation with categories stored in vector store. (5.5k of unique categories) using gpt-4o-mini as main model;
My questions are:
How to reduce hallucinations?
Is it possible to use batch processing with the assistant API instead of asynchronous processing?
Why does the status sometimes return as “failed” on the first run for some items, but generate different outputs on the second run with the same input?
Is there any batch functionality available for the assistant API?
How can I determine if my usage of the assistant API has hit its rate limit?
You can reduce hallucinations by only asking what the AI can know, and instructing for the AI to return an error if the knowledge isn’t directly repeatable from file search tool return if that is where you are having issues.
The AI must make its own search query in order to get information from a vector store, and then chunked data is returned of semantic similarity. Text of a bunch of categories split into chunks that the AI must make extra tool calls to obtain will result in a poor-performing application.
To get fine grained error information, list and check the run step objects
You don’t get to observe the API or token rate limit, and the assistants API disregards any API organization model rate limit, calling iteratively until exhaustion. There’s no header returned with how many Assistants endpoint API calls per minute are left (which can be quite low, like 60 RPM).
There is no batch processing. Assistants is inherently multi-turn and is not a model, but rather a computer program that calls models.