Template messages with function calling with less AI interception

Currently I am working on application where assistant has to answer to user messages, but sometimes it requires constant template messages, e.g. order confirmation, patient information.

The problem I got stuck with is that I cannot return template message when using function calling as when I use structured output.

Example of structured output:

User: how to solve 2x+7**x = 9

Structured Output:
{"steps": [{"type": "step", "explanation": "We start with the equation", "output": "2x + 7^x = 9."}, { "type": "step", "explanation": "To isolate terms, rewrite the equation", "output": "7^x = 9 - 2x."}, {"type": "step","explanation": "This is a transcendental equation and can't be solved algebraically. We will use numerical or graphical methods to find an approximate solution.", "output": "Use numerical methods or graph the functions."}, {"type": "step","explanation": "We can plot the functions f(x) = 2x + 7^x and g(x) = 9 to find the intersection.", "output": "Find intersection points.")],"final_answer": "The solution can be approximated numerically or graphically. The approximate solution is x ≈0.2."}}

Output:
Step 1. We start with the equation
Section Answer: 2x + 7^x = 9.

Step 2. To isolate terms, rewrite the equation
Section Answer: 7^x = 9 - 2x.

Step 3. This is a transcendental equation and can’t be solved algebraically. We will use numerical or graphical methods to find an approximate solution.
Section Answer: Use numerical methods or graph the functions.

Step 4. We can plot the functions f(x) = 2x + 7^x and g(x) = 9 to find the intersection.
Section Answer: Find intersection points.

The solution can be approximated numerically or graphically. The approximate solution is x ≈0.2.

.

I want to make same thing with function calling, but as I understand when you use function call, its output it has to be pushed to LLM to complete run and add to thread. Output pushed using openai.beta.threads.runs.submit_tool_outputs(...)

But I want to return template message to end user as it is done when it structured output is used.

Solutions I come up with:

  1. cancel run and then add template message with arguments to thread, but in this can we might lose other tools if there are.
  2. Let Assistant to generate response(do not cancel), but add message to memory storage and when run.status == “completed”, delete generated response and append the one that was in memory storage.
    Example Code:
memory_message = ""
req_action = False
thread_id = ''
if run.statuts == "requires_action":
    handle_tool_calls(run, client, thread_id) # also add template messages to memory_storage variable
    req_action = True

elif run.status == "completed":                                       
    messages = client.beta.threads.messages.list(thread_id=thread_id)                                                                                                                           
    message = messages.data[0]
    if req_action == True:
          if message['role'] == 'assistant':             
               client.beta.threads.messages.delete(thread_id=thread_id, message_id=message.id)
               client.beta.threads.messages.create(thread_id=thread_id, role='assistant', content=memory_message)
               req_action = False
    return message.content[0].text.value

I have not tested it yet, but I think it will work this way.

So in the end:

  1. Is there any better way to make it work, because right now, I literally waste tokens and time?
  2. I want to request a feature to allow less AI interception when there is function call and if there is template message.

I think you need to be more clear about what you’re trying to accomplish.

Specifically what you mean by “template messages,”
and AI interception.

You can output structured output and use function calling, they aren’t mutually exclusive.

https://platform.openai.com/docs/guides/structured-outputs

Do you want to use the function call to solve the equation? That seems like it’d be a good idea to get an actual useful answer reliably as opposed to hallucinated nonsense.

What is the problem you’re trying to overcome? What is the goal?

  1. My assistant has to use file_search and functions, so I cannot use Structured output.

  2. I have a function that should return information about delivery to specific address.

calculator = DeliveryZoneCalculator()
def find_by_address(address):
    try:
        delivery_info = calculator.get_delivery_info_by_address(address)
        
        min_free_delivery = None
        if delivery_info['min_free_delivery']:
            min_free_delivery = delivery_info['min_free_delivery']
        
        info = {
        "address": delivery_info['address'],
        "delivery_cost": delivery_info['delivery_cost'],
        "free_delivery_for_orders_over": min_free_delivery
        }
        return info

    except Exception as e:
        logging.error(f"Error: {e}")
        return f"Address is not found!"


if tool_call.function.name == "find_by_address":
   try:
      arguments = json.loads(tool_call.function.arguments)
      output = find_by_address(
                 arguments["address"])
      # This is template message that doesnot change, but just different arguments are pasted
      # I need those template messages, so customer sees only important information in structured way
      template_message = f"""*Address*: {output.get("address")}\n*Delivery cost*: {output.get("delivery_cost")}\n*Free delivery for orders over*: {output.get("free_delivery_for_orders_over")}""" 
      memory_message += f"{template_message}\n"
      client.beta.threads.runs.submit_tool_outputs(
                        thread_id=thread_id,
                        run_id=run.id,
                        tool_outputs=[{
                            "tool_call_id": tool_call.id,
                            "output": json.dumps(output)
                        }]
                    )

Example output:

Address: Ovijochi 7
Delivery cost: 1250
Free delivery for orders over: 14000

Without template message:

The delivery cost to Ovijochi 7 is 1250. Please note that delivery is free for orders over 14,000. Would you like to place an order?

So output may vary depending messages in thread, which I want to avoid.

Im not entirely sure what you’re using the LLM for specifically in this case; But let’s say for example you’re using it for nlu to parse addresses coming in varying formats.

Why not just have the llm call a tool where the argument is the address, have the client call the function and return the output of that function to the front end directly in a structured way? Your program/function is what returns the output. You don’t need to rely on an llm for formatting.

The llm can receive either just a success/fail message, or the full output from the tool depending on whether it’s needed for the conversation later.

Note that while you can’t use “structured output” per se, you can still ask the llm to output its reply in a structured way (valid json for example) and provide examples in your prompt.