I ran the exact code given in the documentation for vision api.
But I got the below error. Same applies to gpt-4-turbo model. Am I missing something?
openai.BadRequestError: Error code: 400 - {'error': {'message': "Invalid type for 'messages[0].content[1].image_url': expected an object, but got a string instead.", 'type': 'invalid_request_error', 'param': 'messages[0].content[1].image_url', 'code': 'invalid_type'}}
Here is the code:
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image_url",
"image_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
},
],
}
],
max_tokens=300,
)
print(response.choices[0])
2 Likes
Your code appears to be missing the object structure under the âimage_urlâ key.
The correct format should look like this:
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Whatâs in this image?"},
{
"type": "image_url",
"image_url": {
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
"detail": "high"
},
},
],
}
],
max_tokens=300,
)
print(response.choices[0].message.content)
The âdetailâ key can also be set to âlow,â but it appears it cannot be omitted.
I often make this mistake too, and it can be a bit confusing.
I hope this helps you even a littleđ
3 Likes
prem.s
May 15, 2024, 11:56am
4
Thank you! that works.
But I ran into a new problem when using AWS S3 presigned URL which gave the same error. will explore further.
ebobr
May 17, 2024, 2:47am
6
I get the same error when using s3 resigned URL with gpt-4o or gpt-4-turbo. No error with gpt-4-vision-preview. OP, have you found a solution?
Hi @ebobr , For me, the problem got solved when I did not include max_tokens in the request. But this is not an actual solution. Its a workaround we found. The problem with S3 bucket URLs still exist.
1 Like
Jâai eu la meme erreur. La documentation de lâapi est erronĂ©e , elle correspond au model gpt-4-vision-preview :
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model=âgpt-4-turboâ,
messages=[
{
âroleâ: âuserâ,
âcontentâ: [
{âtypeâ: âtextâ, âtextâ: âWhatâs in this image?â},
{
âtypeâ: âimage_urlâ,
âimage_urlâ: âhttps://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg â,
},
],
}
],
max_tokens=300,
)
print(response.choices[0])
Yes, it seems that the documentation isnât always correct. Please refer to the method I mentioned above.
Le format des données renvoyé ne correspond plus à du json, voici un autre extrait de la documentation openai:
from openai import OpenAI
client = OpenAI()
audio_file = open(âspeech.mp3â, ârbâ)
transcript = client.audio.transcriptions.create(
model=âwhisper-1â,
file=audio_file
)
{
âtextâ: âImagine the wildest idea that youâve ever had, and youâre curious about how it might scale to something thatâs a 100, a 1,000 times bigger. This is a place where you can get to do that.â
}
Pour accĂ©der Ă la rĂ©ponse, auparavant on Ă©crivait transcript[âtextâ] (comme pour un dictionnaire) mais maintenant il faut Ă©crire transcript.text : cela correspond plutot Ă lâattribut dâun objet dâune classe, bref on a rĂ©cupĂšre un objet et non du json âŠ
Jâai lâimpression que lors de la derniĂšre grosse mise Ă jour dâopenai lâannĂ©e derniĂšre , avec le passage de lâĂ©criture du style openai.Audio.translate Ă client.audio.transcriptions.create , la mise Ă jour de la documentation nâa pas suivie et a fait un mixte des deux âŠ
this solution save my day using GPT-4o , plus converting max_tokens to integer. Thank you
2 Likes
Thanks a bunch. Works like a charm. They should really updated the documentation over here:
https://platform.openai.com/docs/api-reference/chat/create
Iâm not sure if you took a look at OpenAIâs GitHub sample, but what youâre suggesting doesnât seem to be correct:
{
"cells": [
{
"cell_type": "markdown",
"id": "f6ce3d79",
"metadata": {},
"source": [
"# Parsing PDF documents for RAG applications\n",
"\n",
"This notebook shows how to leverage GPT-4V to turn rich PDF documents such as slide decks or exports from web pages into usable content for your RAG application.\n",
"\n",
"This technique can be used if you have a lot of unstructured data containing valuable information that you want to be able to retrieve as part of your RAG pipeline.\n",
"\n",
"For example, you could build a Knowledge Assistant that could answer user queries about your company or product based on information contained in PDF documents. \n",
"\n",
"The example documents used in this notebook are located at [data/example_pdfs](data/example_pdfs). They are related to OpenAI's APIs and various techniques that can be used as part of LLM projects."
]
},
{
"cell_type": "markdown",
This file has been truncated. show original
_j
June 30, 2024, 6:51pm
14
There are multiple methods to pass images to vision models, some support depending on which model you are using, and no grid of which model has which API message methods.
I apologize, my explanation was not accurateâŠ
It should have been stated that there needs to be a âurlâ key within the object corresponding to the âimage_urlâ key, and this âurlâ key should contain the actual URL.
By doing this, it will function correctly.
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Whatâs in this image?"},
{
"type": "image_url",
"image_url": {
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
},
},
],
}
],
max_tokens=300,
)
print(response.choices[0])
The âdetailâ key is not necessarily required, and if omitted, it defaults to âautoâ, letting the model decide whether to set it to âlowâ or âhighâ.
Sorry for inaccurate explanationâŠ
# File generated from our OpenAPI spec by Stainless. See CONTRIBUTING.md for details.
from __future__ import annotations
from typing_extensions import Literal, Required, TypedDict
__all__ = ["ChatCompletionContentPartImageParam", "ImageURL"]
class ImageURL(TypedDict, total=False):
url: Required[str]
"""Either a URL of the image or the base64 encoded image data."""
detail: Literal["auto", "low", "high"]
"""Specifies the detail level of the image.
Learn more in the
[Vision guide](https://platform.openai.com/docs/guides/vision/low-or-high-fidelity-image-understanding).
"""