I’ve been encountering a perplexing issue with the OpenAI API for the past 2-3 days and was hoping to get some insights or solutions from this knowledgeable community. I am aware that there was a similar topic from last year but it does not solve my problem.
Issue Summary: I have a script that interacts with the OpenAI API, specifically requesting responses from the GPT-4 model. This script has been functioning as expected until recently. Despite requesting GPT-4 explicitly and being billed for GPT-4 tokens, the nature of the responses suggests that I’m receiving outputs from GPT-3 instead.
Why I Think I’m Receiving GPT-3 Responses: My confidence in this observation comes from the qualitative difference in responses to certain queries. Notably, GPT-4 has a specific way of handling requests for information outside its training data, such as stating its inability to access external websites, whereas GPT-3 tends to generate responses regardless. The discrepancy became apparent when I reran scripts with the same inputs and noticed a marked difference in the responses, which no longer matched the expected behavior of GPT-4.
Sample Code: The code is straightforward, except for the network_error_detect function which is a simple function that looks at the API response to see if it indicates that the model was unable to access websites and try again. It’s contents are unimportant. I created it because gpt-4 routinely fails to read web sites the first time and succeeds after a retry. Of course, it relies on gpt-4’s behavior of declaring the error in its response which gpt-3 does not do. (I know it’s ugly but it works.)
def get_response_simple(self, system_message, user_message, model="gpt-4", max_attempts=4):
for _ in range(max_attempts):
# Make the API call using ChatCompletion for chat models
completion = self.client.chat.completions.create(model=model,messages=[{
"role": "system", "content": system_message,
"role": "user", "content": user_message
}])
# Extract and return the response
response = completion.choices[0].message.content
is_failure = self.network_error_detect(response)
if is_failure:
self.do_nothing(15)
else:
return response
response_message = "##*** Network error. Maximum retries exceeded. ***\n\n###Prompt:\n\n" + user_message + "\n\n###Response:\n\n"
return response_message + response
Request for Community Insights:
I know this problem has cropped up before and the responses were rather dismissive, but I need to find a solution.
The response says it is gpt-4-0613. OpenAI usage dashboard says it’s gpt-4-0613. But it’s not acting like gpt-4. Anyway, I’ve opened a support request. I found this community later not realizing humans were actually looking at it. But I’m happy to entertain any ideas.
It’s possible that you’ve used gpt-4-turbo, and not gpt-4. there’s a significant difference. set your model to either gpt-4-1106-preview or gpt-4-0125-preview.
gpt-4-0613 is the worst of the gpt-4 models, a significant slump from 0314 (which isn’t available to most people anymore)
the turbo models are slightly dumber, but more stable in terms of hallucinatons and such.
Yes. The OpenAI dashboard is very specific about what is being used. The only gpt-4 variant is gpt-4-0613. I hit the playground a couple of times today but I rarely use it.
GPT-4 is the worst? (GPT-4 points to gpt-4-0613) Hahaha.
The only thing improved about hallucinations in the turbo models is that you get a wall of denials from half the inputs you’d send, and no clever inferences out of the rest.
I think that’s valuable in its own right. Maybe I haven’t been evaluating 0613 fairly, but using 1106 to prevent 0314 from going off the rails works pretty well. I consider the denials a feature
Are you telling me 0613 has any advantage in any scenario?
where did you make your initial assessment?do you wanna share your prompt?
Here is my current test code that I set up for support including the prompt. Mary is a fake attorney:
# Test Main
import openai_test_processor
import os
from openai_test_processor import OpenAITestProcessor
from datetime import datetime
# Initialize output
ai = OpenAITestProcessor()
context='You are a helpful assistant.'
prompt='Go to https://63898d4121644524ae9b79145f5ed630.stophatingyourbusiness.com/law-offices-of-mary-pason/ and tell me what industry Mary Pason is in.'
ai.set_initial_context (context)
# Get the current date and time
now = datetime.now()
timestamp = now.strftime('%Y-%m-%d %H:%M:%S')
response = ai.get_one_response (prompt)
content = f"{timestamp}: {response}\n\n"
# Get the current date and time
now = datetime.now()
timestamp = now.strftime('%Y-%m-%d %H:%M:%S')
response = ai.get_one_response ('What GPT model is this?')
content += f"{timestamp}: {response}"
output_folder = 'test_output'
file_name = 'test.txt'
output_path = os.path.join(output_folder, file_name)
with open(output_path, "w") as f:
f.write(content)
Class:
# openai_test_processor.py
import os
from openai import OpenAI
import json
class OpenAITestProcessor:
def __init__(self):
self.conversation_history = [] # Keep track of messages to the GPT
# Initialize OpenAI API key
self.api_key = os.getenv('OPENAI_API_KEY')
self.client = OpenAI(api_key=self.api_key)
self.system_client = OpenAI(api_key=self.api_key)
def set_initial_context(self, system_message):
# Clear existing conversation history and correctly reference it with self
self.conversation_history = []
# Adds the initial system message to the conversation history
if system_message:
self.conversation_history.append({"role": "system", "content": system_message})
def get_one_response(self, user_message):
# Make the API call using ChatCompletion for chat models
completion = self.client.chat.completions.create(model="gpt-4-turbo-preview",
messages=self.conversation_history + [{"role": "user", "content": user_message}])
# Print the model
print(completion.model)
# Extract and return the response
response = completion.choices[0].message.content
return response
I can’t browse the internet or click on links, so I’m unable to directly check the website you’ve provided to determine the industry Mary Pason is in. If you’re interested in the law offices of Mary Pason, it’s likely that she is involved in the legal industry or provides legal services. If you need more specific information, you might want to describe the services or information listed on the website, and I can provide more detailed insights based on that.
gpt-4-0125-preview
that’s what you wanted, right?
I’d avoid these arbitrary endpoints, because things will randombly break. pick a fixed version. things can still break, but it’s a little less chaotic.
So it’s broken for everyone then? I would’ve thought someone would have noticed. The original code just specified ‘gpt-4’ and it worked from January up until last week.
there was talk of switching it to 1106, then they changed their minds, or maybe they unchanged their minds. Eventually they added the generic gpt-4-turbo-preview. I didn’t track it because it doesn’t really matter.
It’s indeed possible that you were using 0613 up until now, but perhaps your use-case drifted so you just never noticed until now? Really hard to say.
It’s been the same project the whole time. But this does lead to the obvious question, if anyone has code still working that relies on Internet access, I’d like to know what model they are using because I can’t find one that works.
Or maybe they just don’t know their models aren’t working and they are getting garbage answers. It took me a minute to realize it.
None of the API models do. You need to build a function/tool that the model can call, that scrapes the internet for the model.
But I think the picture is becoming clearer now.
Is it possible that you’ve indeed been using 0613 all this time, and you thought you had internet access? If that’s the case you’ve been getting hallucinations all along.
I know for certain that I had internet access. It would be impossible to generate the output I have without it. I suppose it’s possible that having Internet access was a bug that was fixed.
Understand that these models have been trained on the majority of the internet up until some time in 2021, such that they can take a very educated guess what a link might contain by the url alone.