I also have JSON output being mentioned in my prompt
Temperature is set to 0.2.
If it was a problem with the prompt, wouldn’t it fail all the time and it’s a timeout and not an error. In your shared sample, you set timeout to 5, what is the unit there?
Are you streaming the answer so that you can see the progress?
Because now you can’t see the progress every five minutes in the usage page to determine what you were charged in tokens, as the prior features were intentionally attacked, an obfuscating modification to existing functionality meant to harm the user and make ill-conceived products such as “assistants” unaccountable.
If you aren’t streaming the completion, then you won’t see that the AI model has gone nuts in a loop of nonsense tokens until it fills up the context length, until it times out on you without returning an answer.
Setting your own timeout will ensure you do not see the 4000 “/n/n/n” that you paid for. The likely result of a new AI model on the edge of breaking all the time, so much that it was “trained” to make json but still needs to be instructed in order to not produce garbage.
I would see if you can just prompt for the JSON output that you want the AI to produce, and NOT use the separately-trained json-output model that doesn’t work right, and which is invoked by the response_format parameter.
You always reply in the json format with the fields:
- ai_message: your message to the Human
- status: assessment status
- status_comment (very detailed comment to the status, not to be shown to the Human)
This is one of the examples. On some other ones I show actual json like you do.
Hey, so i am currently creating batches and running requests async. Our company is t4 and im trying to get closer to the 1mio token per minute limit.
The issue im experiencing are very long response times for single requests in the async batch (several minutes). This currently exclusively happens for the new 1106 model. GPT-3.5-turbo-0613 does not run into this issue and i have a much higher throughput (i would gues roughly 4x?).
Now to my question, since the new model is slower. Will this change? And also are the new 1106 api prices also the new prices for GPT-3.5-turbo-0613, or is 0613 currently more expensive than 1106?
Hello, I have a similar issue where I have a new use case that only 1106 can do well, but I’m unable to migrate to 1106 because about 20% of the time the responses will take over 20 seconds. For 0613 and 0301, this only happens 1% or less. What is the ETA for the fix for this?
This will get better with time, more hardware and software refinement, at this stage it’s all beta and should be treated as such, while beta development can be difficult it does give you access to high value markets with little in the way of direct competition… at least to start with and the first to market advantage is often enough to carry products over the line for the long term.
I cannot stop being amazed how people are complaining about the magic wand they were suddenly given, because of the slighly wrong shade of the color of free socks they wish from the wand…
Well, OpenAI is a company that offers its services and asks for payment, so I’d expect some ‘quality’ to their services.
It’s not just about complaining and whining. The earlier model does not have the slowness issue, but the newer one does. As we’re building commercial applications based on that, we’d like this problem to be fixed. Right now, with those delays, the model is simply unusable for lots of users (me included).
OpenAI is a commercial company now. If a service is substandard (and it is, I’ve never experienced those delays) then customers will complain, until it’s fixed. Looks normal to me
Friends, I just confirmed: OpenAI released 3.5-turbo-1106 to just annoy Devs! I initially thought you people were just making noise over the issue because I was busy trying to defend the company by studying the Assistants API from the playground. The playground seems to be kinda hospitable to a user. The API is really hostile, eh? It’ll literally choose when to reply to a user, when it should even process your input tokens. Just released to drink input tokens without delivering results. I think we can all agree on the previous model anyone can decide to remain with it and continue sorting out their input tokens and contexts because 1106 is definitely drunk