Calculate max_tokens for completion requests to GPT-4o with image

Hi there. I’m wondering what the best way to calculate max_tokens if I want to make a completion request to gpt-4o model in case I have an image in the request and I want as lengthy as it possible response?

Without images I can use tiktoken to get the input’s length and that calculate max_tokens as a model_context_length - input_length. But it does’t work if I have an image if I’m not mistaken.

Below you can see my prompts (simplified):

messages = [
        {
            "role": "system",
            "content": (
                "{really long propmt}"
            )
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Here is an image:"
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{encoded_image}",
                    },
                },
                {
                    "type": "text",
                    "text": ". Here is the json with the lines:\n{detection_results_json}."
                }
            ]
        }
    ]
completion = await client.chat.completions.create(
    model=model,
    messages=messages,
    temperature=temperature,
    max_tokens=max_tokens
)