Python's Silent Standoff with GPT-4-Vision-Preview - Seeking help!

Hello folks, I have some issues with the GPT4 vision model. I’m trying to create a completions response, but having no luck with that, below I will attach my function. I would appreciate if someone would give me insights or solutions, what is wrong with it.
I’ve been wrestling with the GPT-4 vision model and my completion function is playing hard to get – it’s giving me the cold shoulder, no responses. I’m coding blind here (literally, as I’m totally blind) and could use an extra pair of eyes. Rocking the latest python package (3.3.1), but something’s amiss.

Function below:

def add_image_context(cls, prompt:str, prompt_type:str):
    prompt_url = ""
    prompt_messages_list = []
    prompt_content_dict = {
        "role": prompt_type,
        "content": None
    prompt_text_dict = {
        "type": "text",
        "text": None
    prompt_url_dict = {
        "type": "image_url",
        "image_url": None
    #checking if prompt contains url 
    if prompt.find("https") != -1:
        lines_list = prompt.split("\n")
        prompt_url = lines_list[0].strip()
        prompt = cls.list_to_text(lines_list[1:])
        prompt_text_dict["text"] = prompt
        prompt_url_dict["image_url"] = {"url": prompt_url}
        prompt_content_dict["content"] = prompt_messages_list

JSON dumps result from the chat_context (context):

        "role": "system",
        "content": "You are ChatGPT, a large language model trained by OpenAI. Answer as concisely as possible. Current date: 2023-11-16"
        "role": "user",
        "content": [
                "type": "text",
                "text": "Can you give me the image description suitable to the blind person?"
                "type": "image_url",
                "image_url": {
                    "url": ""

I’m having trouble with the vision preview model; it’s not working correctly, although other models are fine. Oddly, OpenAI’s example works, but mine doesn’t, and I’m not sure why. I’m blind, so I need a text-based solution. I’ve ruled out other functions as the problem, so I’m looking for help to figure out what’s going wrong.

1 Like

I forgot to mention that I’m getting no errors, no exceptions, I’m just getting no response, which is basically my program hangs and does nothing…

response =
	messages = context,
	max_tokens= 1000

Hey mate, and welcome to the community!

Having GPT download the Images is a bit wonky, I’ll recommend downloading yourself and using this approach instead:


As I mentioned in my post, the example provided by OpenAI works just fine. And the fact is they do recommend to use url]s for images themselves. So I have a suspicion there is something wrong with my code…
I suppose I could try to encode the image to base 64. I haven’t tried that one, but I would really like to use the URLS.
I appreciate your input though.

1 Like

Welcome to the dev forum @zen_mode

What does the response object look like in your case?

Also what’s the cls object. I am asking because it looks like the request is using cls.model to get the model parameter. Why not simply explicitly define gpt-4-vision-preview as the model?

Versions of Python that are supported are 3.7.1 to 3.11.5, and I would stay in the middle of that range for highest compatibility.

Then to use the latest openai library for python, which was even updated today, you would open a command prompt or shell, and then issue the python command:

pip install --upgrade openai

Which will now get you to version 1.3.0

If you would simply like a script that works for vision on your own files, I created one at this forum link just yesterday:

It will print the AI response to your request for image analysis, and also uses a method to get the rate limit header and token usage.

Then finally, an image coming from your own script, you have the ability to downsize, keeping the image to one or four tiles instead of up to sixteen, all being billed as prompt tokens.


I’m working with multiple models and the function I posted above is a part of another function. I have a separate function just for vision related requests.

My Python version is 3.9. I meant the package version in the post…
For the URL, I use the URL provided by the OpenAI documentation for the vision model.
I will investigate your provided script though, thanks for that.
Although I’m looking specifically what is wrong with my own code… in terms of what’s printed by the response object I’m just not getting there I’m getting no response so that’s that

I looked at your code to process images. The fact is I’m trying to do exactly the same thing, except I’m passing the list of objects. But something is wrong, I guess, because I’m getting no response from the API. No air is nothing.

All sorted turns out that it wasn’t my function problem at all. The solution for now is:

response =
response = response.parse()

And do the usual stuff to get the data about response…

  • Thanks @_j for providing the link.
  • It looks like for now we have to manually parse the JSON.
  • Accessibility sucks for screen readers on this forum. If by any chance there are some devs who are reading this, get back to me and I will help you test it out. So it’s properly accessible to folks like myself.

I have to edit this post because I can’t mark @_j reply as solution so admins has to do it because it’s inaccessible to screen reader users by the looks of it some kind of message pops up and i can’t get to it with the keyboard so that proves my point doesn’t it…

OpenAI developers are geniuses. Accessibility is about education, more than anything else…
I would be curious to know how many devs here had some introduction to accessible design in colleges or universities because I got none myself. Many developers don’t have clue what a screen reader is. Let alone how it works. Anyway, I’m finished with my complaining:)

You don’t HAVE to use the “with raw response”. I threw that in to my code because the rate limit on the vision model is an always-present concern for active use, and that’s how you can now access the headers.

You can just use the python library’s typical response grabbing, streaming without asyncio for example:

from openai import OpenAI
client = OpenAI()
system = [{"role": "system",
           "content": """You are chatbot who enjoys computer programming."""}]
user = [{"role": "user", "content": "brief introduction?"}]
chat = []
while not user[0]['content'] == "exit":
    response =
        messages = system + chat[-20:] + user,
        model="gpt-3.5-turbo", top_p=0.5, stream=True)
    reply = ""
    for delta in response:
        if not delta.choices[0].finish_reason:
            word = delta.choices[0].delta.content or ""
            reply += word
            print(word, end ="")
    chat += user + [{"role": "assistant", "content": reply}]
    user = [{"role": "user", "content": input("\nPrompt: ")}]

You can even go back to a familiar dictionary form:

model =
  messages = system + user,
response = model.model_dump()

but with your vision user message and vision model.

I know what you’re saying, but everything works for me just fine with other models, including GPD4, but not the Vision 1, and getting the raw response does work, so I don’t know…
Ideally it should work without getting the raw responce But it doesn’t. It works with all other models though. So that’s why I asked originally what is the problem with my function? Because if the response is passed as I originally posted my program just hangs there and I’m not getting the response from the creating completions function.
I don’t know, but it seems to me maybe I am constructing the final dictionary wrong or something, but then again why the Jason with raw response works and original list of dicts does not?

It works without raw response. The problem is that you have an old version of the library installed.

Just run:

pip install --upgrade openai

and follow the vision guide.

I have the latest version as of today…
I check for the new version every few days…