As we move more OpenAI products into production environments it would be advisable to add greater transparency to status updates.
It is impossible to know if the issues that developers are experiencing are tied to the issues being investigated, which leaves us to guess on whether the issue is a solvable one relating to our design, or to OpenAI.
More detailed status updates are necessary for production environments. Simply saying that you know an issue and are working on it is not sufficient. This is a necessary consequences of the increasingly complex offering.
Hey! Can you share more? My original reading was looking for context related to status.openai.com incidents, but the wording here makes it seems otherwise. Can you clarify?
Yes. Iâm suggesting that status.openai.com have more fulsome updates on the impact developers can experience from current challenges.
For the past 48 hours the Assistants API has been hallucinating function calls, and telling users that itâs not configured to perform function in its current state, and that it is âsimulatingâ a reply.
Thereâs no indication on the error report on whether or not the issue you are investigating relates to function calling, or something else. So in our own use case we donât know if OpenAI is experiencing an issue that impacts us, or whether the model behaviour has.jsut changed and therefore we need to adapt.
I know the Assistants API is in a beta state. But for production weâd need more detailed updates to inform whether we take action or wait.
That is precisely my point. How would we know if the issues are related?
As the product matures I recommend also some greater transparency on how tokens will be compensated in these events. Itâs a tough one since the model often will produce some output, and in alot of cases it might be adequate, when while an outage is ongoing.
But a true same time, it might not⌠Would be good to have a clearly articulated policy.
What I think is described and needed is a (perhaps limited exposure) changelog. As API users, we can perceive the changes, and continued dysfunction after specific time, to model operations. We canât directly address or pursue investigation of the change that caused it.
Example:
date
exposure
description
2024-02-02
all gpt-3.5-turbo(preview), production
softmax mixture optimization
2024-02-02
assistants multi_tool_use
token cutoff, adaptive temperature
2024-02-04
ChatGPT 4 eval model 2, 10% A/B feedback
RLHF training update
2024-02-06
all gpt-4-0613
reverted embeddings masking update 2024-01-30
2024-02-13
ChatGPT, USA 20%
memory, rollout step 2
You can talk with your configuration manager if particular deployments, such as about ârollout to those developers who opted in to beta stage 2â should be proprietary stillâŚwhen youâve got both of those.
Forgot to mention, that could extend to other impactful things, like account management changes to the availability of user limits in the limits page or information presented in the usage, or even updates to tier limits.
I canât speak as a developer. But from a management perspective You should be clear about your work. And I talk about this regularly in the forums and in emails. There should be advance notice, especially about the work of the team. Before I knew it, I wasted many hours. It is impossible to distinguish whether the problem is from AI or another effect of the teamâs work. And it will become bigger if used in the business sector. it will cause problems for users to use immediately if they are not aware of the changes. Many times, when I contact them, I will talk about management issues. especially communication You guys have a website. used by millions of people The messaging is as clear as it gets if you put together a team-based package for months. It points out that you could but didnât.
I really could not agree more. The communication should (and can be) much better.
To illustrate, our recent issue was resolved by recreating our âassistantâ. We had to disbale code interpreter. For some reason the code interpreter was launching during ordinary dialogue (to do things like calculate relative dates, like âtomorrowâ or ânext weekâ). Once code-interpreter was launched, it seems that it would not use actual functions provided. Instead it would only âmockâ results. This was a substantial change in operation affecting multiple models. It is disappointing that it was not communicated, but I am sharing it here in case anyone else has this issue, and also to demonstrate why communication is important.
We also noted the strange behavior of these failed functions consuming approximately 30,000 tokens per message as noted in this thread. Our actual contex is closer to 6,000. So whatever errors occurred were causing a substantial overstatement of token usage which I hope will be rectified.