Try Reducing the max_tokens then it will generate less tokens = less cost. I hope these below resource will help.
{
"type": "image_url",
"image_url": {
"url": "https://commons.wikimedia.org/wiki/Main_Page#/media/File:Die_Woche_Der_Tag_von_Potsdam_cover.jpg",
"detail": "low"
}
}
]
}
],
"max_tokens": 300
https://platform.openai.com/docs/guides/vision/low-or-high-fidelity-image-understanding