Which is correct model for image analysis?

ankish · December 6, 2024, 7:53am

import OpenAI from "openai"
import axios from "axios" // For downloading the image
import fs from "fs/promises" // Optional, for debugging by saving the image locally

const openai = new OpenAI()

async function fetchImageAsBase64(url) {
	try {
		const response = await axios.get(url, { responseType: "arraybuffer" })
		const base64 = Buffer.from(response.data).toString("base64")
		const mimeType = response.headers["content-type"] // Get MIME type from headers
		return `data:${mimeType};base64,${base64}`
	} catch (error) {
		console.error("Failed to fetch the image:", error.message)
		throw error
	}
}

async function imageDescription() {
	try {
		// Example image URL
		const imageUrl = "https://pbs.twimg.com/profile_images/992464734606655488/DVvC0bK6_400x400.jpg"
		const imageBase64 = await fetchImageAsBase64(imageUrl)

		// Send the request to the OpenAI API
		const response = await openai.chat.completions.create({
			model: "gpt-4-turbo",
			max_tokens: 300,
			messages: [
				{ role: "user", content: "What is this picture of?" },
				{ role: "user", content: imageBase64 }
			]
		})

		// Output the response
		console.log("Response:", response.choices[0].message.content)
	} catch (error) {
		console.error("API call failed:", error.response?.data || error.message)
	}
}

imageDescription()

This doesnot works, I tried many none works.

_j · December 6, 2024, 8:07am

This “messages” format is complete fabrication and fiction.

You need to carefully review the API reference for chat completions and construct a multi-block user content properly, like the example.

Models: https://platform.openai.com/docs/models

ankish · December 6, 2024, 8:45am

Thanks, This works completely fine now—

const openai = new OpenAI()

import OpenAI from "openai"

async function imageDescription() {
	const response = await openai.chat.completions.create({
		model: "gpt-4o",
		messages: [
			{
				role: "user",
				content: [
					{ type: "text", text: "What's in this image?" },
					{
						type: "image_url",
						image_url: {
							url: "https://www.hollywoodreporter.com/wp-content/uploads/2023/05/GettyImages-946730430-H-2023.jpg?w=1296"
						}
					}
				]
			}
		]
	})
	console.log(response.choices[0])
	const assistantMessage = response.choices[0].message.content
	console.log(`AI: ${assistantMessage}`)
	console.log(`Token Usage: ${response.usage.total_tokens} tokens used`)
}
imageDescription()

tom21 · December 6, 2024, 1:50pm

I was able to do this pretty easily with “vision” in the Assistants API. I have not done it with chat completions though. The Assistant was pretty cranky about the image format when trying to pass it in context.

My workaround was to give the image as a file to GPT, then attach that file to the thread and run the assistant query on it. I tried several different methods of attaching the file, but this seemed the most reliable. When uploading the file to GPT, make sure you give it ‘vision’ and not “assistants” as the purpose.

ankish · December 7, 2024, 6:37am

Can you Please provide your node code?

tom21 · December 9, 2024, 12:43pm

Here are some snippets. Just make sure to use the file id of the uploaded file in the message add API call. You can also make this more compact by creating the thread with the initial message, but I find it easier to break out the steps when debugging and learning.

       1. add the file to openai. 
        let file = fileCreate(pathToImage, 'vision');

       2. create the thread
        let c_thread = createThread(); 

        3. put a message on the thread - just give the payload the file.id
        let messageid = addImageFile(c_thread, file.id);

  content: [
        {
          "type": "image_file",
          "image_file": {
            "file_id": file.id,     //file id from the reponse object of the upload
            "detail": "auto"
          }
        }
      ],

        4. run the assistant on the thread with whatever your query is.
        const run = runAssistant (c_thread);

Topic		Replies	Views
Unable to directly analyze or view the content of files like (local) images API chat-completion , gpt-4-vision	3	572	November 7, 2024
Integrating Vision with Assistant API API assistants-api	11	3178	May 16, 2024
Can Assistants API understand image files uploaded? API	11	10813	September 28, 2024
Can ChatGPT analyze images using the API? API api	1	2562	July 26, 2024
How to send base64 images to Assistant API? API gpt-4 , chat-with-images , assistants-api , gpt-4o	18	23484	September 18, 2024

Which is correct model for image analysis?

Related topics