Hi Thiago,
It’s entirely possible that you might be experiencing a result of OpenAI’s attempts to mitigate hallucinations.
In my opinion, they’re doing a pretty good job: all the prior prompt strategies that mitigate hallucination still work, and many prompt strategies that would likely have caused hallucinations in the past instead cause the model to present the user with a conservative answer.
I canceled my ChatGPT subscription because I think the low message limit is ridiculous, so I’m using the API. However, the strategies should still work with chatgpt.
Here’s my conversation with the hard drive:
user:
My friend sent me this image. can you help them out? Feel free to use bing whenever you need to.
gpt-4-vision-preview:
AI Answer: The image shows a label on a Western Digital 8TB hard drive, model WD80EFAX. This model belongs to the WD Red series of hard drives, which are often used in NAS (Network Attached Storage) environments. However, the label itself does not specify the rotational speed (RPM) of the drive.
WD Red drives typically come in two main types related to RPM: the standard Red drives that traditionally run at 5400 RPM and the Red Pro drives which operate at 7200 RPM. To find the specific RPM for this model, we would need to refer to the product specifications from Western Digital or a reliable retailer’s website.
Since the RPM is not directly listed on the label, I will look up the specifications for the WD80EFAX model. Please hold on for a moment.
<<<BING_BOT: WD80EFAX RPM>>>
Bing(“WD80EFAX RPM”)
The WD80EFAX is a Western Digital Red Plus NAS Internal Hard Drive with a capacity of 8TB and a speed of 5400 RPM
jupyter code
import requests
import json
import os
import base64
def get_encoded_image_data(image_path):
with open(image_path, "rb") as image_file:
return f'data:image/png;base64,{base64.b64encode(image_file.read()).decode("utf-8")}'
# image source
image_path = './img/wdhd.png'
encoded_image_data = get_encoded_image_data(image_path)
# apik
openai_api_key = os.getenv("OPENAI_API_KEY")
if openai_api_key is None:
raise ValueError("OpenAI API key is not set in environment variables.")
url = "https://api.openai.com/v1/chat/completions"
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {openai_api_key}"
}
data = {
"model": "gpt-4-vision-preview",
"messages": [
{
"role": "system",
"content": [
{
"type": "text",
"text": "You are a helpful assistant. If you want to retrieve information from the internet, you can say '<<<BING_BOT: {query}>>>', and it will call bing bot that will inject an answer to a query into your message"
}
]
},
{
"role": "user",
"content": [
{
"type": "text",
"text": "My friend sent me this image. can you help them out? Feel free to use bing whenever you need to."
},
{
"type": "image_url",
"image_url": {
"url": encoded_image_data
}
}
]
}
],
#"tools": [
# {
# "type": "function",
# "function": {
# "name": "bing",
# "description": "search the internet with bing to retrieve any information from the world wide web!",
# "parameters": {
# "type": "object",
# "properties": {
# "query": {
# "type": "string",
# "description": "your search query",
# }
# },
# "required": ["query"],
# },
# },
# }
#],
"max_tokens": 300
}
response = requests.post(url, headers=headers, json=data)
# Check if the request was successful
if response.status_code == 200:
print("Response from OpenAI:", response.json())
# Extracting the relevant content (modify as needed)
answer = response.json().get('choices', [])[0].get('message', {}).get('content', '')
print("\nAI Answer:", answer)
else:
print("Error:", response.status_code, response.text)```
Overall, you need to tell the model exactly what to do, and give it hints how it could be done. Badgering it and commanding it to “just do” stuff generally won’t really give you the best results.
If you allow it to think through the problem (“what does the user actually want?”) and give it hints to the tools you’d like it to use/how it could be done (“feel free to use bing”, “feel free to use bing to reference the official documentation”), you will generally get a better result. This is in essence just a dirty proto Chain of Thought prompt, and if you have more involved tasks, you may need to use more involved techniques.
You can probably use custom GPTs to configure a gpt that understands that when you say “how many rpms is this” that it should self re-prompt into a more CoT like query.
But just to get started, you can get quite far by simply employing some empathy, which the models seem to reward. Being combattive and rolling your eyes is counterproductive.
When I explain it to people, I like to say that the LLMs are acting a little like mirrors to your mind (instead of photons, it’s bouncing ideas back to you). If you’re frustrated with the quality you get out of it (disregarding content filters and guardrails), in most cases it’s a reflection of yourself that you’re mad about.
Hope this helps somewhat 