Any workarounds for function calling using gpt-4-vision-preview?

rachiket-jackett · February 15, 2024, 12:46am

As everyone is aware, gpt-4-vision-preview does not have function calling capabilities yet. Therefore, there’s no way to provide external context to the GPT-4V model that’s not a part of what the “System”, “Assistant” or the “User” provides.

I’m curious if anyone has figured out a workaround to make sure the external context is injected in a reliable manner?

A different form of this question would be:

Any idea if there is a way to use gpt-4-vision-preview as the base model for a ReAct agent?

_j · February 15, 2024, 1:40am

gpt-4-vision doesn’t take functions, doesn’t take logprobs or logit_bias. They pretty much keep the utility of ChatGPT to themselves.

You’d likely have to provide a regular model an “analyze image” function, with a “prompt” parameter of the desired text to get back. Then inject post-prompt that says “user has made banana-tree.jpg available to the analyze image function”.

dan.raviv · March 1, 2024, 8:52pm

Function calling is just a way to easily structure the input-output for tool calling. You can use GPT4-vision in the same way but without this structured output. For example, tell it that if it needs to use tool X, output a json format with the tool name and parameters, and then parse the output as you would a function call, and simply add the function output as another regular user message.

viroscope · March 8, 2024, 2:08am

How about using function calling, to make it use function calling.

Standard Non vision model with function calling
Image_conversation
if user wants information about an image in context
url image url

Image conversation function that makes a call to vision with the image of context format for image stuff

Just a rando thought

(or function calling for the first response and the followup use vision?)

stevenic · March 8, 2024, 2:39am

You can use AlphaWave with GPT-4V to reliably return a JSON object specifying the name of a function to call and the parameters to pass that function.

There’s nothing magical that OpenAI is doing to support function calling. They’re just adding some text to the end of your prompt describing a list of available functions and then asking the model to return some JSON. You can do that just as easily yourself.

AlphaWave actually enables more reliable function calling because it not only ensures the model returns valid JSON but it also schema validates everything. It makes it impossible for the model to call an invalid function or to return invalid parameters. OpenAI makes no such guarantees.

merefield · April 10, 2024, 6:54am

See update here:

Topic		Replies	Views
Is There a Plan to Enable Function Call Support for the gpt-4-vision-preview Model? Feedback gpt-4 , api	2	1697	April 10, 2024
Does the model `gpt-4-vision-preview` have function calling? API gpt-4	15	6045	March 26, 2024
Gpt4 vision and function calling API gpt-4 , function-calling , gpt-4-vision	3	971	April 1, 2024
[Completions API / gpt-4-vision-preview] Requests with function_call are rejected Bugs gpt-4	5	1505	April 10, 2024
Is there a way to force the model to use function calls? API chatgpt , api	13	9191	June 22, 2024

Any workarounds for function calling using gpt-4-vision-preview?

Related topics