Assistants API Function Tools

Hopefully I haven’t missed something here, but I’m struggling to get my assistant to properly call it’s function. The function should be used whenever the assistant gets an image as part of the message. If I give the assistant just text it works fine, but if I give it an image and text it hallucinates my entire input.
some examples of the createMessage functions I’ve tried:
V1:

const createMessage = async (threadId, userMessage) => {
        console.log('create message triggered')
        try {
            if (userMessage.image) {
                console.log(userMessage.image.split(',')[0])
                console.log('create message file:', userMessage.file)
                await fetch(`https://api.openai.com/v1/threads/${threadId}/messages`, {
                    method: "POST",
                    headers: {
                        "Content-Type": "application/json",
                        'Authorization': `Bearer APIKEY`,
                        'OpenAI-Beta': 'assistants=v1',
                    },
                    body: JSON.stringify({
                        role: "user",
                        content: userMessage,
                        tool: "analyzeImage"
                    })
                });
                console.log('User message sent to API contains image:',);
            } 

V2:

const createMessage = async (threadId, userMessage) => {
        console.log('create message triggered')
        try {
            if (userMessage.image) {
                console.log(userMessage.image.split(',')[0])
                console.log('create message file:', userMessage.file)
                await fetch(`https://api.openai.com/v1/threads/${threadId}/messages`, {
                    method: "POST",
                    headers: {
                        "Content-Type": "application/json",
                        'Authorization': `Bearer APIKEY`,
                        'OpenAI-Beta': 'assistants=v1',
                    },
                    body: JSON.stringify({
                        role: "user",
                        inputs: {
                        text: userMessage.text,  // Pass along any user-entered text
                        data: {
                        file: userMessage.file,
                        image: userMessage.image.split(",")[1]  // Split the string and take only the base64 part 
                             }
                         },
                        tool: "analyzeImage"
                    })
                });
                console.log('User message sent to API contains image:',);
            } 

The function tool is called analyzeImage

  "name": "analyzeImage",
  "parameters": {
    "type": "object",
    "properties": {
      "image": {
        "type": "string",
        "contentMediaType": "image/jpeg",
        "description": "The base64-encoded string of the image."
      }
    },
    "required": [
      "image",
    ]
  },

I saw an example from Geoligard where he uses an actual file url, but since this is being done on mobile I was trying to use just the base64 version of the image. Any input as to why I’m getting the hallucinations and it almost never triggers the function would be appreciated.

2 Likes

Hello and welcome to the community!

So, I think the big misconception here is this:

The file URL they were using is likely there for a reason. The major flaw here is that there is an assumption the AI will automatically be able to decode the string. It cannot decode strings automatically; you would need to either handle the encoding/decoding yourself, or surgically manipulate the tokenizer.

If you are passing it a description of an image as this:

aGVsbG8gd29ybGQh

It is either going to be interpreted as a weird token it’s never seen before and hallucinate like crazy, or think it’s junk and ignore this entirely (if it even can).
It needs to be fed interpretable data. file URLs work because the AI is already well trained on interpreting such URLs. It is not trained to be a base64 decoder.

What you are better off doing is instead of encoding it as base64, encode an image description as a vector embedding inside a db. When the function is called, you can replace that embedding to inject the actual textual description of the image.

1 Like

Thanks for the input. Your explanation does explain why when adding an image which is just a super long string of random characters the ai hallucinates. I thought I had seen somewhere that base64 was an acceptable way to send the files, but maybe not. Do you have any thoughts on the format of the body being sent? V1 vs V2? does adding that “tool” property do anything? I’ll try doing a file url and seeing what happens. Thanks again for help so far!

1 Like

Okay, so I digged for a minute, and I think I might be seeing where the confusion is.

I’m looking at this for reference:

https://platform.openai.com/docs/guides/vision

Here’s the gist:

  • you were probably looking at how they used GPT-4 Vision, which can be passed a decoded base64 string. See here:
# Function to encode the image
def encode_image(image_path):
  with open(image_path, "rb") as image_file:
    return base64.b64encode(image_file.read()).decode('utf-8')
  • gpt-4-v can only work with chat completions, not the assistants API.

Does this mean you can’t build your tool? No. but what it does mean is that you are likely going to have to call a chat completion API with vision separately in order to retrieve the description result of the image that you could use to pass into the assistant’s context/query.

Keep in mind the tool property is required if you’re trying to perform any kind of function call with the model. Everything else is up to you. There’s not as much advice I can give on the version differences, because of the problem above. I also personally prefer this kind of work be done on a backend instead of JS, but again, that’s a “me” thing and you are more than okay to continue as-is.

1 Like


Macha I appreciate all your help yesterday. So I tried uploading a url you can see the url and run in the image above. The code for upload is below and I tried adjusting the function tool to match. But once again I’m back in a spot where it won’t trigger the requires action to make use of the function tool which, yes does call the Vision api separately to then return the description of the image back to the assistant. It’s also still hallucinating when there is an image even being passed as a url. Any thoughts? I’m wondering whether I could add another property that would trigger it. Like a containsImage property that’s true/false see if that might trigger the function.

 body: JSON.stringify({
                        role: "user",
                        text: userMessage.text,
                        image_url: userMessage.image_url,
                    })

function tool:

  "name": "analyzeImage",
  "parameters": {
    "type": "object",
    "properties": {
      "image_url": {
        "type": "string",
        "contentMediaType": "image/jpeg",
        "description": "The url of the image sent by the user."
      },
      "text": {
        "type": "string",
        "description": "Message the user sent along with the image, without the URL itself"
      }
    },
    "required": [
      "image_url",
      "text"
    ]
  },
1 Like

Of course! Always happy to help!

Can we see the updated script again in full? Usually in these situations, when I’m reviewing my own code, I like to “pretend” I’m the computer and go through the logic flow of the code step-by-step to make sure it’s flowing the right way.

Also, I can’t tell from this function call that gpt-4-v is being called and used properly. My thinking was more…ensure the tool is a wrapper for the gpt-4-v call, but I don’t necessarily see evidence of that here. Then again, at this point, it’s becoming more of a “how might I find a hacky workaround to retrieve vision data via assistant API call”.

Macha you’re very kind to keep digging on this.
So below is my create message

 const createMessage = async (threadId, userMessage) => {
        console.log('create message triggered')
        try {
            if (userMessage.image_url) {
                console.log('create message file:', userMessage.image_url)
                await fetch(`https://api.openai.com/v1/threads/${threadId}/messages`, {
                    method: "POST",
                    headers: {
                        "Content-Type": "application/json",
                        'Authorization': `Bearer APIKEY`,
                        'OpenAI-Beta': 'assistants=v1',
                    },
                    body: JSON.stringify({
                        role: "user",
                        text: userMessage.text,
                        image_url: userMessage.image_url,
                        containsImage: true,
                       
                    })
                });
                console.log('User message sent to API contains image:',);
            } else {
                await fetch(`https://api.openai.com/v1/threads/${threadId}/messages`, {
                    method: "POST",
                    headers: {
                        "Content-Type": "application/json",
                        'Authorization': `Bearer APIKEY`,
                        'OpenAI-Beta': 'assistants=v1',
                    },
                    body: JSON.stringify({
                        role: "user",
                        content: userMessage.text
                    })
                });
                console.log('User message sent to API does not have an image:',);
            }
        } catch (error) {
            console.error("Error sending user message:", error);
        }
    };

Here is my check run that it should pick up the requires action

 const checkRun = async (threadId, runId) => {
        try {
            let completed = false;
            let retries = 0;
            const maxRetries = 120; // Maximum retries before giving up
            const timeoutMs = 1200000; // Maximum time to wait in milliseconds (e.g., 30 seconds)
            const startTime = Date.now();

            while (!completed && retries < maxRetries && !awaitingResults) {
                retries++;
                setTyping(true)
                const statusCheckResponse = await fetch(
                    `https://api.openai.com/v1/threads/${threadId}/runs/${runId}`, {
                    method: "GET",
                    headers: {
                        "Content-Type": "application/json",
                        'Authorization': `Bearer APIKEY`,
                        'OpenAI-Beta': 'assistants=v1',
                    },
                }
                );

                if (!statusCheckResponse.ok) {
                    console.error("API request failed at check status:", statusCheckResponse.statusText);
                    break; // Exit the loop if the status check request fails
                }

                const runResponse = await statusCheckResponse.json();

                switch (runResponse.status) {
                    case 'completed':
                        completed = true;
                        console.log('Run completed, fetching messages...');
                        try { // Try-catch block specifically for fetchMessages
                            await fetchMessages(threadId);
                        } catch (error) {
                            console.error('Error fetching messages:', error);
                        } finally {
                            setAwaitingResponse(false);
                        }
                        break;
                    case 'in_progress':
                        console.log('The run is currently in progress.');
                        break;
                    case 'requires_action':
                        console.log('requires action');
                        // Access the array of required tool outputs
                        const toolOutputs = await runResponse.required_action.submit_tool_outputs?.tool_calls;
                        console.log('Tool id from requires action ', toolOutputs);
                        const toolId = toolOutputs[0].id
                        if (toolOutputs && toolOutputs.length > 0) {
                            setAwaitingResults(true);
                            try {
                                console.log('Tool id is : ', toolOutputs[0].id)
                                console.log('Tool image is: ', toolOutputs[0].image_base64)
                                await analyzeImage(threadId, runId, toolId, toolId, toolOutputs)
                             
                            } catch (error) {
                                console.error('Error processing tool output:', error);
                                setAwaitingResults(false);
                            }
                        } else {
                            console.log('No tool outputs to process');
                        }
                        break;
                    case 'failed':
                    case 'cancelled':
                        completed = true;
                        console.log(`The run has ${runResponse.status}.`);
                        break;
                    default:
                        console.log(`Unknown status: ${runResponse.status}`);
                        break;
                }

                // If not completed, wait for 1 second before the next poll
                if (!completed) {
                    if ((Date.now() - startTime) > timeoutMs) {
                        console.log('Operation timed out.');
                        break;
                    }
                    await new Promise(resolve => setTimeout(resolve, 1000));
                }
            }

            if (!completed) {
                console.log(`Run did not complete after ${maxRetries} retries or timed out.`);
                // Perform any necessary cleanup or notification
            }
        } catch (error) {
            console.error("Error checking run:", error);
            console.error("Response Status:", error.response.status);
            console.error("Response Text:", await error.response.text());
        } finally {
            setAwaitingResponse(false); // This should be the way to update your loading state
        }
    };

Here is my vision api call

const analyzeImage = async (threadId, runId, toolId, toolOutputs) => {
    try {
        console.log('Starting to analyze image');
        console.log('tool outputs inside analyze image: ',toolOutputs)
        console.log('Image is: ', toolOutputs[0].image_base64)
        const startTime = performance.now();
        const response = await fetch("https://api.openai.com/v1/chat/completions", {
            method: "POST",
            headers: {
                "Content-Type": "application/json",
                'Authorization': `Bearer APIKEY`, 
            },
            body: JSON.stringify({
                model: "gpt-4-vision-preview",
                messages: [
                    {
                        role: "system",
                        content: "You are a helpful assistant."
                    },
                    {
                        role: "user",
                        content: [
                            { type: "text", text: "Custom Instructions" },
                            {
                                type: "image_url",
                                image_url: {
                                    "url": "hard coded test image",
                                    'detail': 'auto'
                                },
                            },
                        ],
                    }
                ],
                max_tokens: 2500
            })
        });

        console.log('waiting on response from vision api');
        if (response.ok) {
            const endTime = performance.now();
            
            const data = await response.json();
            console.log("main response:", data.choices[0].message.content);
            const elapsedTime = endTime - startTime;
            console.log(`myFunction took ${elapsedTime} milliseconds to execute.`);
            const results = data.choices[0].message.content;
            await sendBack(results, threadId, runId, toolId);
        } else {
            console.error("Request failed with status:", response.status);
        }
    } catch (error) {
        console.log('Failed to analyze image: ', error);
    }
};

export {
    analyzeImage
}
1 Like

Alright, haha this is definitely giving me a run for my money, but I like challenges!

So, looking at your previous message,

(btw thank god you have logs, I’m terrible at that)
from this, it looks like it is successfully calling this part of your if statement:

Then, it goes to this part of the code:

Notice how the rest of your switch statement is not in my quote.

So, the code looks right, but it does seem to be ignoring the function call.

After looking at the actual assistant’s custom tool call schema seen here:

assistant = client.beta.assistants.create(
  instructions="You are a weather bot. Use the provided functions to answer questions.",
  model="gpt-4-turbo-preview",
  tools=[{
      "type": "function",
    "function": {
      "name": "getCurrentWeather",
      "description": "Get the weather in location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {"type": "string", "description": "The city and state e.g. San Francisco, CA"},
          "unit": {"type": "string", "enum": ["c", "f"]}
        },
        "required": ["location"]
      }
    }
  }, {
    "type": "function",
    "function": {
      "name": "getNickname",
      "description": "Get the nickname of a city",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {"type": "string", "description": "The city and state e.g. San Francisco, CA"},
        },
        "required": ["location"]
      }
    } 
  }]
)

This leads me to conclude your tool definition may not be right. Like this:

It should be “function”. You need to match the function call definition properly, otherwise it wouldn’t be able to properly identify it as a tool it has available to use.

Fix this, and it should be able to start picking up the tool. The rest of the flow looks fine, it’s just getting the AI to pick up the tool that’s the problem.

Macha, You might be onto something here. Here is the documentation version:

Defining functions
First, define your functions when creating an Assistant:
curl https://api.openai.com/v1/assistants \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "OpenAI-Beta: assistants=v1" \
  -d '{
    "instructions": "You are a weather bot. Use the provided functions to answer questions.",
    "tools": [{
      "type": "function",
      "function": {
        "name": "getCurrentWeather",
        "description": "Get the weather in location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {"type": "string", "description": "The city and state e.g. San Francisco, CA"},
            "unit": {"type": "string", "enum": ["c", "f"]}
          },
          "required": ["location"]
        }
      }	
    },

Ok so I modify this for my use which looks something like this:

{
      "type": "function",
      "function": {
        "name": "analyzeImage",
        "description": "custom description.",
        "parameters": {
          "type": "object",
          "properties": {
            "image_url": {
              "type": "string",
              "contentMediaType": "image/jpeg", 
              "description": "The url of the image sent by the user. Starts with https://"
            }
          },
          "required": ["image_url"]
        }
      }	
    }

It then says I need to give the function a name so I put “name”: above the type and click save. When I reopen it to examine it it shows this:

{
  "name": "analyzeImage",
  "parameters": {
    "type": "object",
    "properties": {},
    "required": []
  }
}

I think you’re on the right track here, but the documentation seems to be a little off. I’ll keep exploring this.

1 Like

@Macha we’re back! So after doing some more checking it seems like the assistant api has trouble with functions as encountered by a number of users. (plus some issues this past week) I thought I’d give it a try with chat completions that also has function calling. But when I got to the point of sending it an image it was console logging an 400 error. chatgpt suggested I go through it in postman which I did. Postman tells me

{
    "error": {
        "message": "Invalid content type. image_url is only supported by certain models.",
        "type": "invalid_request_error",
        "param": "messages.[0].content.[1].type",
        "code": null
    }
}

Problem is I used exactly what the docs say:

[

Function calling

](https://platform.openai.com/docs/guides/function-calling/function-calling)

Learn how to connect large language models to external tools.

[

Introduction

](https://platform.openai.com/docs/guides/function-calling/introduction)

In an API call, you can describe functions and have the model intelligently choose to output a JSON object containing arguments to call one or many functions. The Chat Completions API does not call the function; instead, the model generates JSON that you can use to call the function in your code.

The latest models (gpt-3.5-turbo-0125 and gpt-4-turbo-preview) have been trained to both detect when a function should to be called (depending on the input) and to respond with JSON that adheres to the function signature more closely than previous models. With this capability also comes potential risks. We strongly recommend building in user confirmation flows before taking actions that impact the world on behalf of users (sending an email, posting something online, making a purchase, etc).

Here’s my postman body test in Json:

{
  "model": "gpt-3.5-turbo-0125",
  "messages": [{
    "role": "user",
    "content": [
   {
      "type": "text",
      "text": "custom question"
    },
    {
      "type": "image_url",
      "image_url": {
        "url": "test url"
      }
    }
]
  }],
  "max_tokens": 839,
  "top_p": 1,
  "frequency_penalty": 0,
  "presence_penalty": 0,
  "tools": [
        {
            "type": "function",
            "function": {
                "name": "getAnalyzeImage",
                "description": "custom instructions",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "image_url": {
                            "type": "string",
                            "contentMediaType": "image/jpeg",
                            "description": "The url of the image sent by the user."
                        }
                    },
                    "required": ["image_url"]
                }
            }
        }
    ],
  "tool_choice": "auto"
}

If it won’t trigger the function to analyze the image, because I’ve sent it the image_url how else could I trigger the function and still have it run correctly? Is it just not possible for any model to get an image as part of a user input and then have trigger a function to actually review that image?

1 Like

Welcome back! :slightly_smiling_face:

Well, this one’s tough, and as you point out, it’s becoming increasingly difficult to tell where some of the problems lie, because sometimes the API itself does have issues.

That being said, I don’t like making a blanket statement saying it wouldn’t be possible, but I do wonder about the complexity of what’s being attempted as more complex than initially anticipated.

How about this: why not simply try using gpt-4-vision first? Think of it like a diagnostic - if you can successfully use gpt-4-v, which does take an image URL, we can verify that there is (or should be) a legitimate pathway to accomplish a task with some url of an image as input.

The next step after would be seeing how to align what you want with what can be done.

If I had to guess, I wonder if some of the issues has to do with file path resolution. Or, to put it more precisely, there might be a discrepancy between the url and the actual file itself. A url is not entirely synonymous to a file path; perhaps that is where the bottleneck is?

I apologize I can’t be of more help!

@Macha I had previously given thought to your idea of sending it to the vision api first, but I don’t think that gets me where I’m trying to go. The issue is I can do my task in a custom gpt I made. I can ask a question, attach a photo and bam I get exactly what I’m looking for. Problem is I know a number of people who don’t want to pay the $20/month for chatGPT plus to access it. Some are willing to pay say $5/month for just my gpt though. To give you an idea of what I’m trying to accomplish, (this isn’t my idea but it’s close enough) I want to take a picture of restaurant menu and then say I have a dairy allergy is there anything on here I should avoid? So I ask my question, attach a photo which triggers a function, which takes the image sends it to the vision api which then feeds back a list of the menu items with any ingredients/descriptions listed. The chat completions or assistant api now has my question and the list of items with some pertinent information and can give me a response. I don’t think I can do that without that middle step. Thoughts?

I think it’s still helpful to try anyway, because as you already said, it’s an inevitable middle step. Calling the vision API is inescapable.

By testing that out first, you can rule out some important things, and we can scaffold from what we know works.

For example, if a basic vision API can be called and an image url can be passed without issue, then we know that the problem lies with the function call. If it doesn’t work, then there may be an issue with the image URL itself. Once something can get going to work, that’s when we can go back to how things are being passed around, and what’s going on with the function call.

@Macha I did bypass the sending the image to the chat api and sent it to the vision api first. It provides close enough response of what i’m asking for. It can certainly read the image without issue. This leaves only the function calling as the issue. Maybe I will have have to send it to vision first and then attach the response as an image_description to the message that gets sent to the api. Then if there is an image description I could try using that to trigger some new function but that’s really not ideal.

1 Like

Unfortunately the Assistants API does not support vision models yet, and the chat completions API does not support function calling with vision models. Both are planned, we’ll keep you updated!

1 Like

@atty-openai Thanks for jumping in here. I just want to confirm this so I’m clear. At present, I can create a custom gpt with instructions, in the chat box I can attach a photo and ask it about the photo and it retains the context so I can continue to ask questions about the photo. You are saying: There is no method at present for passing an image to either the chat api or the assistant api and having that image trigger a custom function that would then send the image separately to the vision api and report the answer back to either the assistant or chat api as the function tool response? Simulating the assumed process that takes place with the custom gpt’s we can make in chatgpt.

To any future readers of this question and thread I was sort of able to solve my problem. If you use the chat completions api you can force it to use a function. What this means in the context I’ve talked about in this thread is as follows:

  • You create a messages array.
  • You can create a message with a photo.
  • Send that photo to the vision api and get your response first
  • You can then take the text part of the message along with a special trigger value and send it to the chat completions api with the forced function calling. The forced function calling will recognize the trigger at which point you can create response that includes response from the vision api.

This is where the tricky part seems to come in, you then need to send the entire messages array with your newly created response to a second chat completions api. This is not like the assistants api where it just sits waiting on the update. You actually have have to send it again “manually”. At this point it will respond to your initial text regarding the photo.

I confess this may not be the best way and may not work with a super long series of messages but if those aren’t a concern the process above seems to work.

Hi I just want you to tell me how to call assistant api in my code and how can I send message to it. I really need this. I am making a chatbot, I want to send the questions to the assistant API and show the answers on the screen, please tell me how I can send and receive messages to it.