Although the chat gpt-4o API I use in my application is in the same chat history, when I send a new question about the previous image I sent with the URL, the API sometimes says it cannot see the image, and sometimes gives a correct or wrong answer.
Also, sometimes they can give very wrong answers that have nothing to do with the image.
Welcome to the community!
Can you give us an example of how youâre sending off your first and second requests? (just the messages list)
Now I am sharing a real chat example with you. Donât be misdirected by the assistantâs use of the 3.5 model in texts. The same thing happens when you use the 4o model in texts.
content:âHow many jars are there in the image?â
createdAt: June 14, 2024 at 3:22:58 PM UTC+3
images: [httpsâŚstorage.googleapiscom/âŚ0VUpaaki4zQsKA]
role:âuserâ
content:âThere are four jars in the image.â
createdAt: June 14, 2024 at 3:23:08 PM UTC+3
gptVersion: âgpt-4o-2024-05-13â
role: âassistantâ
content: âWhat fruits are in the jars in the image?â
createdAt: June 14, 2024 at 3:23:30 PM UTC+3
role: âuserâ
content: âThe fruits in the jars are strawberries, blueberries, raspberries, and blackberries. â
createdAt: June 14, 2024 at 3:23:31 PM UTC+3
gptVersion: âgpt-3.5-turbo-0125â
role: âassistantâ
content: âWrong answer, can you double check the picture I sent?â
createdAt: June 14, 2024 at 3:24:34 PM UTC+3
role:âuserâ
content:âI apologize, but as an AI, I am unable to view or analyze images. If you could describe the fruits in the jars to me, I can help identify them for you.â
createdAt: June 14, 2024 at 3:24:35 PM UTC+3
gptVersion: âgpt-3.5-turbo-0125â
role:âassistantâ
if you have an image anywhere in your context you want to use, you need to use a vision capable model. gpt-3.5 doesnât have vision.
The model doesnât ârememberâ what it saw âearlierâ in the conversation. the concept of time doesnât really exist here, the models need to look at everything again with every run.
just to confirm:
- youâre using the openai assistants framework
- youâre changing models mid conversation
Hereâs why Iâm asking:
some people are sending one message after another against the completion endpoint, and wondering why the model keeps âforgetting/hallucinatingâ the past conversation. With a completion endpoint, you need to send the whole conversation history to the API with every new message. If you sent images, youâll need to send them again with every call.
If youâre using the assistant thing, it should manage that for you.
But a non-vision model absolutely cannot interpret images for you. Iâm surprised (but not that surprised) that you donât get an error.
Can you give an example where you call it correctly and it still doesnât work?
Let me show you a new example with fully gpt-4o model:
content:âHello, how many objects are in the image?â
createdAt:June 14, 2024 at 4:18:24 PM UTC+3
images:[âhttpsâŚstorage.googleapis.comâŚG%2FA%3D%3Dâ]
role:âuserâ
content:âThe image contains four objects.â
createdAt: June 14, 2024 at 4:18:26 PM UTC+3
gptVersion: âgpt-4o-2024-05-13â
role:âassistantâ
content:âIt looks like there might have been a misunderstanding â I cannot view images. However, if you describe the red object to me, I can try to help identify it! â
createdAt:June 14, 2024 at 4:19:14 PM UTC+3
gptVersion: âgpt-4oâ
role:âassistantâ
I am not using the Assistant API that OpenAI providers. Actually, my server side code is as follows, let me show some parts of it:
const chatHandler = (): { [k: string]: RequestDefault } => ({
async interpret(req, res): ResponsePromise {
try {
const { messageHistory, config, userId, sessionId, userMessage, imageUrls } = req.body
....
let _messageHistory = handleHistoryLimit(messageHistory);
_messageHistory = handleMaxAllowedChars(_messageHistory);
messages = getMessages(_messageHistory, config);
....
const completion = await openai.chat.completions.create({
messages: messages,
model: isMember ? 'gpt-4o' : 'gpt-3.5-turbo-0125',
});
const answer = completion.choices[0].message.content.trim();
answerContent = {
content: answer,
createdAt: admin.firestore.FieldValue.serverTimestamp(),
gptVersion: isMember ? 'gpt-4o' : 'gpt-3.5-turbo-0125',
role: 'assistant',
};
} else if (userMessage && imageUrls) {
messages = [
{
role: 'user',
content: [
{ type: 'text', text: userMessage },
...imageUrls.map((imageUrl) => ({
type: 'image_url',
image_url: { url: imageUrl },
})),
],
},
];
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: messages,
});
const answer = response.choices[0].message.content.trim();
answerContent = {
content: answer,
createdAt: admin.firestore.FieldValue.serverTimestamp(),
gptVersion: response.model,
role: 'assistant',
};
can you json dump this message object, and turn off the model switch mechanic?
I tried as you sent but the result is still the same
content: âHow many objects are in the image?â
createdAt:June 14, 2024 at 4:47:29 PM UTC+3
images: âhttpsâŚstorage.googleapis.com/âŚ2B2CSMosQ%3D%3Dâ
role:âuserâ
content: âThere are four objects in the image.â
createdAt: June 14, 2024 at 4:47:39 PM UTC+3
gptVersion: âgpt-4o-2024-05-13â
role: âassistantâ
content: âWhat kind of object is the red object in the image I sent?â
createdAt:June 14, 2024 at 4:48:22 PM UTC+3
role: âuserâ
content: âIâm sorry, but I canât view images. However, I can help you with descriptions or answer questions!â
createdAt: June 14, 2024 at 4:48:24 PM UTC+3
gptVersion:âgpt-4oâ
role:âassistantâ
import json
print(json.dumps(messages))
Iâd like to investigate what youâre actually sending
const chatHandler = (): { [k: string]: RequestDefault } => ({
async interpret(req, res): ResponsePromise {
try {
const { messageHistory, config, userId, sessionId, userMessage, imageUrls } = req.body;
const newMessage = messageHistory ? messageHistory[messageHistory.length - 1] : null;
if (!userId) {
return res.status(422).json(toRes(null, 'Missing userId.'));
}
if (!sessionId) {
return res.status(422).json(toRes(null, 'Missing sessionId.'));
}
const openai = new OpenAI({
organization: process.env.OPENAI_ORG_ID,
apiKey: process.env.OPENAI_API_KEY,
});
let messages;
let answerContent;
if (messageHistory && config) {
if (!newMessage || !newMessage.content) {
return res.status(422).json(toRes(null, 'Missing information inside data object.'));
}
let _messageHistory = handleHistoryLimit(messageHistory);
_messageHistory = handleMaxAllowedChars(_messageHistory);
if (_messageHistory.length <= 0) {
return res.status(413).json(toRes(null, 'Object is too big.'));
}
messages = getMessages(_messageHistory, config);
// JSON dump the messages object
console.log('Messages-1:', JSON.stringify(messages, null, 2));
const completion = await openai.chat.completions.create({
messages: messages,
model: 'gpt-4o',
});
const answer = completion.choices[0].message.content.trim();
answerContent = {
content: answer,
createdAt: admin.firestore.FieldValue.serverTimestamp(),
gptVersion: 'gpt-4o',
role: 'assistant',
};
} else if (userMessage && imageUrls) {
messages = [
{
role: 'user',
content: [
{ type: 'text', text: userMessage },
...imageUrls.map((imageUrl) => ({
type: 'image_url',
image_url: { url: imageUrl },
})),
],
},
];
// JSON dump the messages object
console.log('Messages-2:', JSON.stringify(messages, null, 2));
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: messages,
});
const answer = response.choices[0].message.content.trim();
answerContent = {
content: answer,
createdAt: admin.firestore.FieldValue.serverTimestamp(),
gptVersion: response.model,
role: 'assistant',
};
printLog(response.choices[0].message.content, 'yellow');
} else {
return res.status(422).json(toRes(null, 'Missing information inside data object.'));
}
await admin
.firestore()
.collection('users')
.doc(userId)
.collection('chats')
.doc(sessionId)
.collection('messages')
.add(answerContent);
return res.status(201).json(toRes({ answer: answerContent.content }, 'Request interpreted successfully!', false));
} catch (error) {
printLog(error, 'red');
return res.status(500).json(toRes(error, `Internal Server Error: ${String(error.message)}`));
}
},
});
Messages-2: [
{
âroleâ: âuserâ,
âcontentâ: [
{
âtypeâ: âtextâ,
âtextâ: âHow many objects are in the image and what are their types?â
},
{
âtypeâ: âimage_urlâ,
âimage_urlâ: {
âurlâ: âhttpsâŚstorage.googleapis.com/âŚ606lA%3D%3Dâ
}
}
]
}
]
printLog(response.choices[0].message.content, âyellowâ);
====>
The image contains four objects:
- A pink rectangular prism.
- A red cylinder.
- A beige cube.
- A dark brown rectangular prism.
Messages-1: [
{
âroleâ: âsystemâ,
âcontentâ: "Your name is ***⌠"
},
{
âroleâ: âassistantâ,
âcontentâ: âHi,There!â
},
{
âroleâ: âuserâ,
âcontentâ: âHow many objects are in the image and what are their types?â
},
{
âroleâ: âassistantâ,
âcontentâ: âThe image contains four objects:\n1. A pink rectangular prism.\n2. A red cylinder.\n3. A beige cube.\n4. A dark brown rectangular prism.â
},
{
âroleâ: âuserâ,
âcontentâ: âWhich is the longest object in the image I sent?â
}
And the last response of the assistant: Sorry, I canât view images. Describe the objects, please?
I think the problem is that the image is not saved back into history
4o is trash compared to 4 to be frank - a total downgrade.
i keep all my chats and the quality downgrade is immediately apparent.
it got so bad that i configured my gpts to use 4 exclusively.
yeah, you need to send it again. you canât just strip out the image out on your subsequent callâŚ
Itâs obvious that it canât see something youâre not sending it