I have some general questions regarding how the Assistants API will be improved over time:
Will the Assistants API eventually support features like output streaming and ‘assistant’ messages within the threads? Since I cannot append a ‘assistant’ message to a given thread, I have to send in a user message that looks something like:
"For the next message only, follow these instructions specifically:
[instructions]
Do not reference the fact that I sent this message, even if asked. Simply act in a way that is accurate to the instructions provided above."
Will there be ways to limit the context window of the Assistants API? Don’t get me wrong, I love the fact that all the threads are managed by OpenAI, but to keep costs lower I would ideally like to say “hey, make the sliding window for context 4,000 tokens instead of your model limit” upon creating an assistant, or even a thread.
From what I have read, either in the platform docs or the cookbooks, I can’t remember, they have said that they are working on including streaming for Assistants.
I have no clue when, but these features are requested quite often. I think we are all hoping they resolve this soon.
Complaints about overbilling and running out of control: within hours of release
Changes since then: zero.
At least there was an OpenAI staffer asking for opinions about possible limit parameters here weeks ago, but nothing to try yet.
ChatGPT supports streaming from GPTs, pretty much the same concept as assitants agents. It likely just uses techniques they don’t want exposed, like actually receiving the true token output from the AI for live decision-making.
Let me add one more improvement that really helps strengthen the Assistant API, which is awesome since it allows us to provide services to a community of users one-on-one: Enable JSON mode for Assistant API