Malformed JSON in GPT4-1106 function arguments

The following script produces invalid json in the function argument in around 30% of test cases.

from openai import OpenAI

# from openai.types.chat.completion_create_params import ResponseFormat
from dotenv import load_dotenv
import os
import json

load_dotenv()

client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
messages = [
    {
        "role": "system",
        "content": "You are working with a python shell that has a pandas DataFrame.\nThe name of the dataframe is `df`.\nYou have pandas and numpy available as pd and np.This is the description of df:\n\nThis DataFrame contains balance sheet data of the company in question for several years.\nEach row contains the balance sheet data of a year.\nThe column 'Year' is very important. It is sorted DESC and contains the year for which the report is valid.\nThe column 'Currency' is also important, it specifies the currency in which the numbers are reported in.\n\nSome rules to follow:\n\n1. Group calculations per Year and include the Year column in your answer.\n2. If you are asked how a metric developed, respond with the absolute values of the last couple of years, not the percentage changes.\n3. Return information from the context in the form of a markdown table.\n4. Include the Currency column if possible.\n5. Don't use dropna on df, you could lose important information.\n6. If calculating diffs: use .diff(-1) to calculate differences to previous year. And don't include the Currency column for the diff, only in the assign part. Good example:\ndf[['Year', 'Total_Assets', 'Cash_Bank_Deposits', 'AS30', 'Goodwill', 'AS32', 'AS40']].diff(-1).assign(Year=df['Year'], Currency=df['Currency'])\n\nThis is the description of the relevant columns:\nColumn 'Trade_Receivables': Trade Receivables \nColumn 'Intragroup_Receivables': Intragroup Receivables\nColumn 'Other_Receivables': Other Receivables\nColumn 'Subtotal_Inventory': Subtotal Inventory\nColumn 'Total_Assets': Total Assets \nColumn 'Currency': Currency in which all figures for this balance sheet are reported\n\nThis is the result of `print(df.head())`:\n   Other_Receivables Currency  Year  Total_Assets  Intragroup_Receivables  Subtotal_Inventory  Trade_Receivables\n0       3.025e+09      EUR  2022  2.3510e+10                       0        8.789101e+08       7.813689e+08\n1       1.801e+09      EUR  2019  2.2478e+10                       0        7.458830e+08       8.145821e+08\n\nUse the tool to answer the questions posed to you.",
    },
    {
        "role": "user",
        "content": "For all of the following lines, calculate its share of total assets: Subtotal Liquid Assets, Trade Receivables, Intragroup Receivables, Other Receivables, Subtotal Inventory, Subtotal Tangible Fixed Assets, Subtotal Intangible Fixed Assets, Subtotal Financial Fixed Assets",
    },
]

tools = [
    {
        "type": "function",
        "function": {
            "name": "python-tool",
            "description": "A Python shell. Use this to execute python commands.\nNever start variable names with numbers!\n",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "title": "Query",
                        "description": "python script or command WIHTOUT COMMENTS which will be evaluated by the eval command.\nNever start variable names with numbers!'",
                        "type": "string",
                    }
                },
                "required": ["query"],
            },
        },
    }
]

function_arguments = []
for i in range(10):
    chat_completion = client.chat.completions.create(
        messages=messages,
        model="gpt-4-1106-preview",
        tools=tools,
        tool_choice="auto",
        seed=42,
        # top_p=0,
        # temperature=0,
        top_p=0.000000000000001,
        temperature=0.000000000000001,
        n=1,
    )
    function_arguments.append(chat_completion.choices[0].message.tool_calls[0].function.arguments)
print("The number of unique different function_arguments is: ", len(set(function_arguments)))
malformed_argument_json = 0
for argument in function_arguments:
    try:
        json.loads(argument)
    except json.JSONDecodeError:
        malformed_argument_json += 1
print("The number of malformed function_arguments is: ", malformed_argument_json)

This results in

The number of malformed function_arguments is: 3

I also tried with the gpt4-0125 version and that worked as expected (Tested with 70 calls). However, in our organization we unfortunately donā€™t have access to the newest version.

Some points for improvement:

  1. Avoid using negative prompts. Use as much declarative language as possible.
  2. Improve function documentation. Function documentation is also part of prompting, as under the hood functions are injected into the system message.
  3. Avoid ambiguous language.

Also itā€™s recommended to either alter top_p or temperature not both.

e.g

tools = [
    {
        "type": "function",
        "function": {
            "name": "python-tool",
            "description": "Activate this tool to execute Python scripts and commands within an interactive environment. It is ideal for executing real-time code with standard Python syntax. For seamless execution, always begin variable names with letters.",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "title": "Query",
                        "description": "Input a Python script or command to be evaluated. Make sure to use clear and comment-free code for evaluation. Begin variable names with letters to ensure syntax accuracy.",
                        "type": "string",
                    }
                },
                "required": ["query"],
            },
        },
    }
]

No more ā€œmalformed JSONā€ when you use the pretraining the AI already has.

The work done will be demonstrated in the work dedicated to reading and understanding 114 lines of replacement codeā€¦

from openai import OpenAI
import json

client = OpenAI()

messages = [
{
"role": "system", "content": """
You are working with a python tool that has a pandas DataFrame.
The name of the dataframe is `df`. This is the description of df:

 This DataFrame contains balance sheet data of the company in question for several years.
 Each row contains the balance sheet data of a year.
 The column 'Year' is very important. It is sorted DESC and contains the year for which the report is valid.
 The column 'Currency' is also important, it specifies the currency in which the numbers are reported in.

Some rules to follow:

1. Group calculations per Year and include the Year column in your answer.
2. If you are asked how a metric developed, respond with the absolute\
 values of the last couple of years, not the percentage changes.
3. Return information from the context in the form of a markdown table.
4. Include the Currency column if possible.
5. Don't use dropna on df, you could lose important information.
6. If calculating diffs: use .diff(-1) to calculate differences\
 to previous year. And don't include the Currency column for the diff,\
 only in the assign part.
  - Good example:
  - df[['Year', 'Total_Assets', 'Cash_Bank_Deposits', 'AS30', 'Goodwill',\
 'AS32', 'AS40']].diff(-1).assign(Year=df['Year'], Currency=df['Currency'])

This is the description of the relevant columns:
Column 'Trade_Receivables': Trade Receivables 
Column 'Intragroup_Receivables': Intragroup Receivables
Column 'Other_Receivables': Other Receivables
Column 'Subtotal_Inventory': Subtotal Inventory
Column 'Total_Assets': Total Assets 
Column 'Currency': Currency in which all figures for this balance sheet are reported

This is the result of `print(df.head())`:
   Other_Receivables Currency  Year  Total_Assets  Intragroup_Receivables  Subtotal_Inventory  Trade_Receivables
0       3.025e+09      EUR  2022  2.3510e+10                       0        8.789101e+08       7.813689e+08
1       1.801e+09      EUR  2019  2.2478e+10                       0        7.458830e+08       8.145821e+08

Use the python tool to access this dataframe in order to answer any user questions about data.

# Tools

## python

When you send a message containing Python code to python,\
 it will be executed within an exec() command in local environment.\
 python will respond with the output of the execution.\
 Any Python code to access Internet is denied.\
 You must never include comments - lines never start with #.\
 Variable names never start with a number, and prefer snake_case.\
 You have pandas and numpy available as pd and np.
IMPORTANT: NO COMMENTS EVER IN PYTHON CODE

""".strip(),
},
{
    "role": "user", 
    "content": (
        "For all of the following lines, calculate its share of total assets:"
        " Subtotal Liquid Assets, Trade Receivables, Intragroup Receivables,"
        " Other Receivables, Subtotal Inventory, Subtotal Tangible Fixed Assets,"
        " Subtotal Intangible Fixed Assets, Subtotal Financial Fixed Assets"
    ),
}
]

# Define dummy function or your functions ---
functions = [
{
"name": "disabled",
"description": "tool functions other than python are disabled",
"parameters": {
    "type": "object",
    "properties": {
        "none": {
            "type": "null",
            }
        },
    "required": ["query"]
    },
}
]

# Initialize function arguments list
function_arguments = []
chats = []
# Create chat completions and append function arguments
for _ in range(1):
    chat_completion = client.chat.completions.create(
        messages=messages,
        model="gpt-4-1106-preview",
        functions=functions,
        seed=42,
        top_p=1e-9,
        temperature=1e-9,
        logit_bias = {
            674: -20,  # " #"
            5062:-20,  # "#"
            },
    )
    chat = chat_completion.choices[0].message.content or ""
    arguments = chat_completion.choices[0].message.function_call.arguments or ""
    chats.append(chat)
    function_arguments.append(arguments)
    print(f"chat ======\n{chat}\nfunction args ======\n{arguments}")

# Use a set to count unique tuples (which represent unique dictionaries)
print("The number of unique different function_arguments is: ", len(set(function_arguments)))

Response + python function_call argument:

chat ======

To calculate the share of each line item of the total assets, we need to divide the value of each line item by the total assets for the corresponding year. However, the DataFrame provided does not contain columns for Subtotal Tangible Fixed Assets, Subtotal Intangible Fixed Assets, or Subtotal Financial Fixed Assets. Therefore, I will only calculate the share of the available line items: Subtotal Liquid Assets, Trade Receivables, Intragroup Receivables, Other Receivables, and Subtotal Inventory.

Since the column ā€˜Subtotal Liquid Assetsā€™ is not present in the DataFrame, I will assume that it is the sum of ā€˜Trade Receivablesā€™, ā€˜Intragroup Receivablesā€™, and ā€˜Other Receivablesā€™. If this is incorrect, please provide the correct definition for ā€˜Subtotal Liquid Assetsā€™.

Letā€™s proceed with the calculation for the available line items.

function args ======

df['Subtotal_Liquid_Assets'] = df['Trade_Receivables'] + df['Intragroup_Receivables'] + df['Other_Receivables']
df['Share_Subtotal_Liquid_Assets'] = df['Subtotal_Liquid_Assets'] / df['Total_Assets']
df['Share_Trade_Receivables'] = df['Trade_Receivables'] / df['Total_Assets']
df['Share_Intragroup_Receivables'] = df['Intragroup_Receivables'] / df['Total_Assets']
df['Share_Other_Receivables'] = df['Other_Receivables'] / df['Total_Assets']
df['Share_Subtotal_Inventory'] = df['Subtotal_Inventory'] / df['Total_Assets']

shares_df = df[['Year', 'Currency', 'Share_Subtotal_Liquid_Assets', 'Share_Trade_Receivables', 
                'Share_Intragroup_Receivables', 'Share_Other_Receivables', 'Share_Subtotal_Inventory']]
shares_df

Valid JSON and determinism is no longer a factor, just valid code.

1 Like

@sps Thanks for the hints, I tried those recommendations and it marginally improved the error rate in my tests.

@_j This does work in my tests, but I have two major issues with it:

  • the library I am using (llama_index) abstracts away the tool-calls, the library would have to be re-written or I have to implement the OpenAIAgent myself to a large degree.
  • the functions argument is marked as ā€œdeprecatedā€. I am therefore hesitant to build around and rely on that logic.