Any workarounds for function calling using gpt-4-vision-preview?

As everyone is aware, gpt-4-vision-preview does not have function calling capabilities yet. Therefore, there’s no way to provide external context to the GPT-4V model that’s not a part of what the “System”, “Assistant” or the “User” provides.

I’m curious if anyone has figured out a workaround to make sure the external context is injected in a reliable manner?

A different form of this question would be:

Any idea if there is a way to use gpt-4-vision-preview as the base model for a ReAct agent?

1 Like

gpt-4-vision doesn’t take functions, doesn’t take logprobs or logit_bias. They pretty much keep the utility of ChatGPT to themselves.

You’d likely have to provide a regular model an “analyze image” function, with a “prompt” parameter of the desired text to get back. Then inject post-prompt that says “user has made banana-tree.jpg available to the analyze image function”.

Function calling is just a way to easily structure the input-output for tool calling. You can use GPT4-vision in the same way but without this structured output. For example, tell it that if it needs to use tool X, output a json format with the tool name and parameters, and then parse the output as you would a function call, and simply add the function output as another regular user message.

1 Like

How about using function calling, to make it use function calling.

Standard Non vision model with function calling
Image_conversation
if user wants information about an image in context
url image url

Image conversation function that makes a call to vision with the image of context format for image stuff

Just a rando thought

(or function calling for the first response and the followup use vision?)

You can use AlphaWave with GPT-4V to reliably return a JSON object specifying the name of a function to call and the parameters to pass that function.

There’s nothing magical that OpenAI is doing to support function calling. They’re just adding some text to the end of your prompt describing a list of available functions and then asking the model to return some JSON. You can do that just as easily yourself.

AlphaWave actually enables more reliable function calling because it not only ensures the model returns valid JSON but it also schema validates everything. It makes it impossible for the model to call an invalid function or to return invalid parameters. OpenAI makes no such guarantees.

2 Likes

See update here: