Microsoft just dropped the nerf bat. One request every min now. Even though my cognitive bill is $400 so far this month, they restrict me now. They are clamping down on AI agents instead of just expanding infrastructure. Sad. They are limiting this technology and development.
is it so hard to people here to realize that they are intentionally limit this to make sure none of you is going to build something big enough before them? this industry is not for you indies and devs.
This is like the hardware chip industry, only the big companies have a place in this.
This is probably a mistake because in weeks, many small startups will start offering this exact capability, donât forget they invent nothing new, this tech was actually developed outside the US, and most researches are outside the US, itâs just that THEY trained it with the whole internet data, having friends and access to ceoâs of Reddit, Github etc.
ok, i figured out. everything works like expected and is fast. the answer is: stream
and i was wrong. stream is working also with chat api.
What I donât get is why they donât just raise prices for the API to something more demand/supply based. Cloud does this and it works just fine. As new resources come in, they can lower prices.
Leave the web interface alone, of course, so people can play around, but if people have high value cases which donât use a lot of tokens, they should be able to use the API - imho.
Mind explaining further what you mean by âstreamâ? How did you manage to make the API respond quickly? Thanks.
Surprisingly, it seems that with the API, GPT-3.5 Turbo, GPT-4, and Text-Davinci-003 are slower than Text-Davinci-002 recently. My company accountâs logic was so slow with GPT-4 that I tested it directly using my Postman call. Apparently, all of them are slower than Text-Davinci-002, even with the same prompt and Text-Davinci-002âs longer text completion.
However, Text-Davinci-002âs accuracy is not good enough for my use case. Therefore, even though it is faster than the other models, I cannot use it.
Can anyone else try Text-Davinci-002 instead of the other models and see the speed?
search for âstreamâ on this site
Any news or solution about this?
In ChatGpt plus I am having 8 times more token than compared to the API (gpt 3.5 turbo) in the same time.
This is affecting our project a lot.
The underlying implementation can summarize context and previous history to keep the thread of older conversations without needing to spend as many tokens.
It may also be helpful to only include user text, not completed bot text, in the context. It seems perfectly capable to keep the thread anyway when doing this.
That being said, models with less context will be faster, because the cost of the model goes up with approximately the square of context. It also goes up by a constant factor related to the number of parameters. Doing less work, will run faster, all else being the same.
Same here. I tried to buy the subscription but the API response time didnât change it. With this reponse time API are not usable!
Having same issues on /chat/completion
endpoint responses with or without stream option, both gpt-3 and gpt-4 are extremely slow. Used Postman to test and even OpenAI Playground response times are also very slow.
Same issue here, Iâm using 3.5turbo. Is there any way to have a response from the support team? We all need to understand whether the time spent on developing can be repaid in the near future and we can offer a good service to our customers or not.
Just adding my +1 here in hopes that that helps get attention to this problem. I have to admit, I am a bit shocked that the paying members get less server preference then the free members? Seemingly. Doesnât make fundamental sense to me?
The same issue. Iâm using text_davinchi_003, and the average response latency is 45 seconds with a token limit of 500. Whatâs interesting is that when doing requests in one thread, the initial response is typically received in 10-15 seconds, but each subsequent request adds an additional 5-15 seconds to the latency time until the last one get back with error. Doing delay (up to 2 sec) between the requests doesnât make effect.
It seems like the delay is related to the âuserâ identifier passed in the request. I began to observe a gradual decline in response speed, which worsened as time went on. Eventually, it reached a point where it became extremely poor, leading me to believe that users would soon start expressing their dissatisfaction. Surprisingly, however, I havenât received any complaints yet. In short, through deduction, I realized that requests using a new user identifier yielded much faster results compared to the one I had been using. Iâm curious to know if others have encountered a similar situation.