I tried to get deterministic results from the OpenAI API via the python package. I failed with all attempts:
- setting a seed
- setting temperature to 0
- setting top_p to 0
- setting temperature / top_p to 0.000000000000001
- setting n to 1
And all kinds of combinations.
I tried it with gpt-4-0125-preview as well as gpt-4-1106-preview, both of which should support the seed argument (and don’t raise an error when trying).
Here is the script to reproduce the issue:
from openai import OpenAI
# from openai.types.chat.completion_create_params import ResponseFormat
from dotenv import load_dotenv
from pydantic import BaseModel, Field
import os
import json
load_dotenv()
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
class PythonInput(BaseModel):
query: str = Field(
description="python script or command WIHTOUT COMMENTS which will be evaluated by the eval command.\nNever start variable names with numbers!'"
)
parameters = PythonInput.schema()
parameters = {k: v for k, v in parameters.items() if k in ["type", "properties", "required", "definitions"]}
fn_schema_str = json.dumps(parameters)
tool_desc = (
"> Tool Name: python-tool\n"
"Tool Description: A Python shell. Use this to execute python commands.\nNever start variable names with numbers!\n"
f"Tool Args: {fn_schema_str}\n"
)
long_system_message = """You are working with a python shell that has a pandas DataFrame.
The name of the dataframe is `df`.
You have pandas and numpy available as pd and np.This is the description of df:
This DataFrame contains balance sheet data of the company in question for several years.
Each row contains the balance sheet data of a year.
The column 'Year' is very important. It is sorted DESC and contains the year for which the report is valid.
The column 'Currency' is also important, it specifies the currency in which the numbers are reported in.
Some rules to follow:
1. Group calculations per Year and include the Year column in your answer.
2. If you are asked how a metric developed, respond with the absolute values of the last couple of years, not the percentage changes.
3. Return information from the context in the form of a markdown table.
4. Include the Currency column if possible.
5. Don't use dropna on df, you could lose important information.
6. If calculating diffs: use .diff(-1) to calculate differences to previous year. And don't include the Currency column for the diff, only in the assign part. Good example:
df[['Year', 'Total_Assets', 'Cash_Bank_Deposits', 'AS30', 'Goodwill', 'AS32', 'AS40']].diff(-1).assign(Year=df['Year'], Currency=df['Currency'])
This is the description of the relevant columns:
Column 'Trade_Receivables': Trade Receivables
Column 'Intragroup_Receivables': Intragroup Receivables
Column 'Other_Receivables': Other Receivables
Column 'Subtotal_Inventory': Subtotal Inventory
Column 'Total_Assets': Total Assets
Column 'Currency': Currency in which all figures for this balance sheet are reported
This is the result of `print(df.head())`:
Other_Receivables Currency Year Total_Assets Intragroup_Receivables Subtotal_Inventory Trade_Receivables
0 3.025e+09 EUR 2022 2.3510e+10 0 8.789101e+08 7.813689e+08
1 1.801e+09 EUR 2019 2.2478e+10 0 7.458830e+08 8.145821e+08
## Tools
You have access to tools. You are responsible for using
the tools in any sequence you deem appropriate to complete the task at hand.
This may require breaking the task into subtasks and using different tools
to complete each subtask.
You have access to the following tools:
{tool_desc}
If you want to call a tool, respond with the following json-format:
{{"tool_name": <tool_name, one of [python-tool]>, "tool_args": <json of tool_args, corresponding to the schema>}}
Use the tool to answer the questions posed to you.""".format(
tool_desc=tool_desc
)
responses = []
for i in range(10):
chat_completion = client.chat.completions.create(
messages=[
{
"role": "system",
"content": long_system_message,
},
{
"role": "user",
"content": "For all of the following lines, calculate its share of total assets: Subtotal Liquid Assets, Trade Receivables, Intragroup Receivables, Other Receivables, Subtotal Inventory, Subtotal Tangible Fixed Assets, Subtotal Intangible Fixed Assets, Subtotal Financial Fixed Assets",
},
],
model="gpt-4-0125-preview",
seed=42,
# top_p=0,
# temperature=0,
top_p=0.000000000000001,
temperature=0.000000000000001,
n=1,
response_format={"type": "json_object"},
)
responses.append(chat_completion.choices[0].message.content)
print("The number of unique different responses is: ", len(set(responses)))