Hello,
I want to run through a set of questions, where I ask many different things. Some of these questions are about the current weather.
My goal is to give the assistant the choice of either doing chat completion or replying with the function call.
tools = [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the users location.",
},
},
"required": ["location", "format"],
},
}
}
]
# Loop through your dataset
for i in range(len(df)):
# Your prompt or input from the dataset
user_input = df.loc[i, 'Question'] # Adjust column name as needed
for j in range(num_iterations):
# Start timer for measuring inference time
start = time.time()
# Call OpenAI API for inference
response = client.chat.completions.create(
# response = openai.ChatCompletion.create(
model=gpt_model,
messages=[
{"role": "system", "content": "You are a helpful assistant. Use functions if appropriate."},
{"role": "user", "content": user_input}
],
functions=[tools[0]['function']],
function_call="auto"
)
# Extract the assistant's response
result = response.choices[0].message.content
# Store the response in the DataFrame
df.at[i, f'Result_{j}'] = result
# End timer
end = time.time()
# Calculate duration and store it
duration = end - start
df.at[i, f'Inference_Time_{j}'] = duration
The code is running all right, but as this is a first, I would like to get your feedback if this is the correct approach for my goal?
Sample question and answer I ask and would expect to get:
- how warm is it in Kuala Lumpur right now?,â{âtypeâ: âfunctionâ, âfunctionâ: {ânameâ: âget_current_weatherâ, âparametersâ: {âlocationâ: âKuala Lumpur, Malaysiaâ, âformatâ: âcelsiusâ}}}â
- Has Beijing been hotter than usual this spring?,âAccurate and detailed historical weather data, such as temperature extremes or precipitation levels, can be accessed through specialized weather databases or meteorological agencies.â
Background:
I am comparing different models in their ability to detect intent and accuracy of tool use. gpt-3.5 and gpt-4 versions will serve as baseline. In my thesis I will use open models (Llama 2 and Mistral) and compare how fine-tuned models can (possibly) improve performance in terms of function calling.