Incorrect API docs for computer use preview

I have been trying to run computer-use-preview model via openai sdk. I have read the documentation
At this moment it says:

from openai import OpenAI
client = OpenAI()

response = client.responses.create(
    model="computer-use-preview",
    tools=[{
        "type": "computer_use_preview",
        "display_width": 1024,
        "display_height": 768,
        "environment": "browser" # other possible values: "mac", "windows", "ubuntu"
    }],
    input=[
        {
            "role": "user",
            "content": "Check the latest OpenAI news on bing.com."
        }
        # Optional: include a screenshot of the initial state of the environment
        # {
        #     type: "input_image",
        #     image_url: f"data:image/png;base64,{screenshot_base64}"
        # }
    ],
    reasoning={
        "generate_summary": "concise",
    },
    truncation="auto"
)

print(response.output)

As for my task I wanted to start with input image of screenshot as my initial state. So, naturally, I uncommented the code, tried to run, but got the error about invalid type. I have been confused and went straight to look at the repo with examples: simple_cua_loop.py
However, I have never got any idea how to start the computer-use-preview model with both text prompt and my own screen.
At the end I figure out the solution and the correct request looks like this:

response = client.responses.create(
    model="computer-use-preview",
    tools=[{
        "type": "computer_use_preview",
        "display_width": 1024,
        "display_height": 768,
        "environment": "windows" # "mac", "browser", "ubuntu"
    }],
    input=[
        {
            "role": "user",
            "content": "run demo app"
        },
        
        {
            "role": "user",
            "content": [{
              "type": "input_image",
              "image_url": f"data:image/png;base64,{screenshot_base64}"
            }]
        }
    ],
    reasoning={
        "generate_summary": "concise",
    },
    truncation="auto"
)

print(response.output)

I hope someone will find this comment usefull, because in time when I needed it I have found zero posts about using computer-use-preview from openai

1 Like

In the content list, you also have a type “text”, that can be used so that it appears that the single user is sending both text and input_image - and with multiple images, also.

input=[
    {
        "role": "user",
        "content": [
            { "type": "input_text", "text": "what is in this image?" },
            {
                "type": "input_image",
                "image_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
            }
        ]
    },
]
1 Like

thanks for the correction! I spent half a day on understanding how to at least send request without error, so never thought about optimising…

1 Like