I just wanted to know your opinions on this regard. In my experience, the first two days of playing around with ChatGPT API were amazing. But since then, I’ve observed a significant degradation. It’s not only my own perception: it is reflected in some customer kpis that I monitor. My two main concerns:
The latencies are terrible now. text-davinci-003 level or worse. In the same experimental conditions, my median latencies have grown from 4 to 10s.
Inconsistencies between playground and API. I’m unable to reproduce the same (awesome) results that I get in the playground when I use the API and complex prompt-engineering is involved. Not even when I don’t use the system role at all. Each environment has almost deterministic results when I set temperature = 0.0, but they are inconsistent with each other. And Playground’s are usually way better.
I don’t know if this only applies to my use case or is a general pattern. I apologize for not being to provide code to back up my statements, but in this case it’s protected by an NDA. Anyways: any thoughts on this? Has anyone been able to tackle this discrepancy effectively? If so, how?
And if an OpenAI’s representative wants to step up and provide an explanation to this, I’d be so happy to read it . Anyways, thanks a lot folks! Keep up building awesome stuff!
There is no difference between the playground and the API. It’s just a GUI. If there is any sort of differences, you may have something wrong/different with your request.
You can validate this by sending a request and looking at the network logs:
Are you streaming your response or waiting for the full completion? A median latency of 4-10s is brutal, and not what I’ve experienced - except for the odd moment. Do you get the same delay when calling another endpoint such as moderation?
Have you seen a difference in the pinned and unpinned models? They are supposed to evolve the unpinned model, and maybe this is causing higher latencies:
Pinned: gpt-3.5-turbo-0301
Unpinned: gpt-3.5-turbo
They could even be the same right now, but force it to pinned and unpinned and see if there is a difference in latency. I’d be curious as to what you find.
That’s actually a very good idea, thanks for the suggestion @RonaldGRuckus. I hadn’t checked out the network logs, but there might still be differences, even if you’re using the same host:
I’m using the asynchronous endpoint of the Python SDK, not curling. I assume that there shouldn’t be any difference between Python and CURL here, but will check it out.
I’m not streaming and the Playground does. But I also refuse to believe that there is any difference whether you stream or not.
Any other difference? Headers? Idk…
But I can confirm that I get different (and inconsistent) results. It is not reproduced with other endpoints (content moderation is included in my pipeline and the latencies have not changed).
But you actually gave me very good ideas to keep researching. I’d inspect the network logs and report what I find (if anything). It would also be useful to understand where is the “system” message included, for instance. @curt.kennedy I’m working with the gpt-3.5-turbo model . Thanks a lot for the suggestion as well!!
I’ll inform here if I find anything worth mentioning.
There shouldn’t be any difference caused by using playground or your own API unless there’s something different in the request.
Accepting a streamed response would result in a faster response time. The end result would be the same but at least you can start returning some data faster rather than wait until the completion is done.
Except for the API key I don’t think any header would modify the API response.
Yeah, of course. If the request is the same one, the result should be the same one (except from the non-deterministic implicit nature of the decoding process). I agree that the request should be different, if the host is the same one. But I do not know what’s the difference, that’s the point.
I also agree on how the streaming should behave.
Anyways, I’ll let you guys know if I find anything. Thanks a lot!!
The models could very well be the same, but the physical hardware (servers, data centers, etc) could be different between Playground and API, and also could be different when you call the pinned and unpinned models. Different physical servers could make a huge difference, even with the same model. They aren’t serving this off of one server, it’s multiple.
Very fair, however it would be completely illogical to send playground requests to different server as it’s purpose is a comparative interface. Unless they have some whitelist for playground requests, there’s nothing to indicate that this is happening. Occam’s razor.
In regards to the model, I agree, it could be that they’re using a different version, for that it’s important to view the network logs.
EDIT
I did actually find something that differentiates between playground requests and API.
I have noticed that the header for the OpenAI Auth key is different (starts with “sess” rather than “sk”, which is what all my API keys starts with), so you may be right.
The way I imagine it is that they are using thousands of servers across the world or across the United States. So to balance the load, they have to send requests to different servers/regions. They had an outage a few months ago and it was caused by a bad network configuration in one entire class of servers they are using. So they even have different classes of servers. So who knows what physical hardware or region you are accessing, or which ones they bring online to the edge, and what differences these different severs and infrastructures bring!
I made an edit, but there is actually a difference between the playground and custom API. So you may be right. Although if it does cause some discrepancies, well, that’s not good. I haven’t personally noticed any differences.
This makes all the sense. They should have a way to differentiate between API and Playground requests.
Still, your suggestion of inspecting the network logs still applies. Unless it’s the host the one that changes. In that case, we’re just totally blind
Right but if @AgusPG is in a different part of the region or world, it could be different for him … because of the different server thing. This stuff is so hard to sort out, because the infrastructure of OpenAI is a black box to us.
Playground is only a GUI. It calls the same endpoint and uses the same parameters. It wouldn’t make sense to differentiate between the two.
@curt.kennedy Yes, you’re right. It could be a network latency issue. However it’s important to note that when you make a call using the playground, it’s a call made from the user’s computer, so the same latency expectations should apply unless, for some reason, as discussed, there is some strange routing happening because it’s a request from the playground. Which could be true.
There are 2 steps I believe that would solve this problem and bring some light:
Check the network logs for the request, and compare
Allow for streaming in the API call and see how quick it returns
The main reason I believe there isn’t any interference is:
It just wouldn’t make sense to permit special routing for playground requests
Better results using the playground has never been a noted feature, or issue
In most cases when people say that playground does work better, it’s been an issue with their API request
I disagree. Even if it’s only for monitoring purposes, you might want to differentiate between these two. And there’s an on-going history of different behavior between Playground and API. Not so long ago, they did not even use the same models (some outages were reported for models that had the label “-playground” on them, for instance). I didn’t even know that the Playground was using the same host and the same underlying model that the API for the chat use case.
Anyways, I do agree with you that inspecting the network logs is the way to go here. Thanks a lot for the suggestion folks, that helps a lot! Will keep you informed
It doesn’t make sense to differentiate between the two.
The playground is simply a testing ground so we don’t have to build our own wrappers to try out the API endpoints. It does nothing different. The API key is excusable because it’s generated differently.
If there are differences between playground and a custom API wrapper, that defeats the purpose of having the playground.
However it was previously, it’s not that way as of right now.
As noted by my initial response example, there is no difference in the request, or URL.
Same here,
Had identical latency of 3-8s 10 days ago
Now playground takes 5s on average, the api call averages at 40s… Nothing has changed, identical prompts, parameters etc, so definetly something changed on OpenAI side
As I said, I fully disagree with your first statement. I can think about tons of cases where OpenAI would prefer to give a different treatment to data collected via Playground than the one we send via the official API. And many other things. For instance: in their Aligning language models to follow instructions paper, they clearly specified that they were using data collected via Playground:
In my view, your opinion on what’s the actual role of the Playground is just that: an opinion.
Anyways, this is not the purpose of the thread hahaha. You helped me a lot with your suggestion. I’ll let you know if I find anything
It’s really not an opinion. It completely defeats the purpose of the playground to have different outputs.
Their reasoning for collecting data is clearly stated: “It’s easier for consent purposes”.
Data collection has nothing to do with the output of the playground.
I’m just trying to give you an actual example on why it makes sense to differentiate between these two. You literally said that “it doesn’t make sense”. And it does. I’m not discussing whether it makes sense to differentiate as regards the output that both generate. But I see in my results. That’s the only certainty that I have .
Anyways, be sure that I’ll let you know if I find anything else. Thanks a lot folks, you were all really helpful