GSEPE
November 6, 2024, 6:59pm
1
Hello fellows,
I cannot make GPT-4o via API to read and process a local image .
The console.log(responseContent.choices);
returns
[
{
index: 0,
message: {
role: 'assistant',
content: `I'm unable to directly analyze or view the content of files like images...
I report here only part of my script.js:
// Function to interact with OpenAI API and show upload progress
async function uploadWithProgress(filePath, mimeType) {
const fileSize = (await fs.stat(filePath)).size;
const bar = new ProgressBar(`Uploading [:bar] :percent :etas | ${path.basename(filePath)} `, {
total: fileSize,
width: 40,
complete: '=',
incomplete: ' ',
clear: true,
});
const messages = [
{
role: 'system',
content: `File Content: ${path.basename(filePath)}`
},
{
role: 'user',
content: 'please analyze the content of this file and provide the response in JSON format.'
}
];
const requestBody = JSON.stringify({
model: API_MODEL,
messages: messages
});
const options = {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${OPENAI_API_KEY}`,
},
};
Am I forced to use type: "image_url",
or is there anything I am missing to use local images?
Thank you in advance for any help!
1 Like
first, you need the vision format (image_url) when sending images. then for local images, use base64.
3 Likes
_j
November 7, 2024, 2:24am
3
What you are attempting to send is significantly wrong.
You tag this “assistants”, but assistants does not support uploading of files in messages, even if encoded to base64.
Yet you make a requestBody with model and messages, giving hint that you are using chat.completions and not Assistants.
Besides not encoding an image properly and sending it as part of contents of a message, you are trying to put simply an image into a system message where it is not permitted, where instead you should be putting instructions that the AI CAN look at images.
This previous forum post has simple application code showing how to makes a system message for a task, and how a user provides the images, and how they must be encoded.
Hi all of you.
The above doesn’t overcome any issues.
If you get refusals, it may be because your “user” is deliberately asking about identifying people, or that your “system” message hasn’t overcome pretraining against this use.
The AI also must be encouraged to have and use its own image computer vision. It is given to you in a state of “not knowing”.
Having a structured output and a singular task makes it less likely to receive a refusal API return, which you should handle, along with oth…
Or how to create the user message alone
There are multiple methods to pass images, and also methods exclusive within Assistants.
Take for example this Python code, which gives alternating strings or base64 image (with size and capability under your control) as user message contents, without JSON-like “type”:
image_paths = ["./img1.png", "./img2.png"]
file_names = [os.path.basename(path) for path in image_paths]
base64_images = []
for path in image_paths:
with open(path, "rb") as image_file:
base64_image = base64.b64enco…
Or how those messages would individually appear as content, with the user providing instructions
It would be essentially the same as sending other few-shot examples: you give an input, and you demonstrate the way the AI responds, so that it can begin following a pattern.
These latest models, such as the 1106 version of gpt-4-turbo that vision is based on, are highly-trained on chat responses, so previous input will show far less impact on behavior.
After the system message (that still needs some more demonstration to the AI), you then pass example messages as if they were chat that occurr…
So the fault is not “gpt-4o API unable to directly analyze ”, but rather “user unable to directly analyze API reference documentation .” Hopefully the links I provided serve as a better documentation.
1 Like
GSEPE
November 7, 2024, 11:54am
4
Straight to the point, thanks.
So, the local file can be processed using the image_url
argument (after being converted into base64 format).
For those wondering about how to have a small node script.js that can read a local image and return output in JSON format, here you are:
import OpenAI from "openai";
import { OPENAI_API_KEY } from './tokens.js';
import { readFile } from 'fs/promises';
const openai = new OpenAI({ apiKey: OPENAI_API_KEY });
const imagePath = './photo_input/TEST.jpeg';
async function encodeImageToBase64(imagePath) {
try {
const data = await readFile(imagePath);
return data.toString('base64');
} catch (err) {
console.error('NOT ABLE TO READ THE FILE:', err);
}
}
async function main() {
const base64_image = await encodeImageToBase64(imagePath);
if (!base64_image) return;
const response = await openai.chat.completions.create({
model: "gpt-4o",
response_format: {
"type": "json_object"},
messages: [
{
role: 'system',
content: `You are ...`
},
{
role: "user",
content: [
{ type: "text", text: "please analyze the content of this image and provide the response in JSON format following this scheme: ..." },
{
type: "image_url",
image_url: {
"url": `data:image/jpeg;base64,${base64_image}`,
},
},
],
},
],
});
console.log(response.choices[0]);
}
main();
1 Like