Different output from api and playground Fine-tuning

Aps2212 · September 30, 2023, 3:48pm

I’ve made a fine-tuning model which works perfectly in the playground, but when i use my own script it comes out with a way worse result, I’ve no idea why it does it.

Playground example:
User:
Arne Jacobsen Seven chair in black leather, Fritz Hansen, black leather chrome frame prdouced in 2010 good condition few marks and scratches on the leather and a lite patina chrome frame black leather,“Height 30 / Width 20 / Depth 40 / Seat height 44,5 Cm”

Assistant:
Arne Jacobsen Seven chair upholstered in black leather divided with a chrome frame. The chair is from 2010 and is generally in good condition, but with few marks and small scratches on the leather, and a light patina chrome frame.

*Designer: Arne Jacobsen
*Manufacturer: Fritz Hansen
*Model: Seven Chair
*Dimensions: Height: 30 / Width: 20 / Depth: 40 / Seat height: 44,5 Cm
*Materials: Black leather and chrome frame
*Condition: Pre-owned chair from 2010

This is the perfect output, but when i use my own script as shown below:

import csv 
import openai
import pandas as pd 
from dotenv import load_dotenv
import os

load_dotenv()
API_KEY = os.getenv("API")

openai.api_key = API_KEY
model_name = "ft:gpt-3.5-turbo-0613:personal::123123"

input_csv_path = "C:/Users/myname/OneDrive/code/auto-v/csv.csv"
#read the csv file into a dataframe
df = pd.read_csv(input_csv_path, header=None)

for index, row in df.iterrows():
  name = row[0]
  producer = row[4]
  description = row[5]
  measurements = row[6]

  print(name, producer, description, measurements)
  #generate text
  response = openai.Completion.create(
    engine = "text-davinci-002",
    prompt=f"write a product description on the following on this product: {name} {producer} {description} {measurements}",
    max_tokens=1000
  )
  print(response)
  #extract the generated text
  generated_text = response.choices[0].text.strip()

  #append the generated text to the last coulmn
  row[9] = generated_text

#save the dataframe back to the csv file
df.to_csv(input_csv_path, header=False, index=False)

it returns this:
*An ultra-modern classic, the Arne Jacobsen Seven chair is a beautifully designed *
piece that will add a touch of elegance to any home. The sleek black leather and chrome frame are perfect for a contemporary space, and the chair is in good condition with only a few marks and scratches on the leather and chrome. The chair is comfortable and stylish, and would make a great addition to any home.

Can anyone explain why the outputs are so massively different?
The input is the exact same, but I do extract it from a CSV file, but that shouldn’t make a difference

Foxalabs · September 30, 2023, 3:54pm

Hi and welcome to the Developer Forum!

Are the replies identical each time? As in, have you ran this test with a temp of say 0.1, 10 times? and compared the results?

(I notice you have no temperature in your API call.)

Aps2212 · September 30, 2023, 4:01pm

Currently, the replies are different each time I run the script.

I just tried adding a temp and running a few tests, but I still get completely different returns, which are nowhere the same as the ones I get in the playground, which is the desired output.

Foxalabs · September 30, 2023, 4:12pm

Ok , so have you done the same 10 times on the playground?

Would you say that without any room for interpretation the ones from the playground are far and above superior to the ones from the API, or could there be a subjective element of conformation bias slipping in?

Aps2212 · September 30, 2023, 4:26pm

I’ve made around 20 tests in the playground now, 10 with a temp of 0.1, and 10 with a temp of 1, all results from the playground were perfect.

I’ve run the same amount of tests with my script and 0/20 returns a satisfactory response.

It seems like the API does not even take the training data into account, the format, etc. is very well formulated in the training data I provided, and the playground does not seem to have an issue recognizing the “pattern”, but the API surely does.

If there is no error in my code that could result in these “bad” responses, I would say that the playground is far superior to the API.

If you have any other suggestions regarding how I could test and improve my results in my script, I would be more than grateful, thank you.

sps · September 30, 2023, 4:31pm

Welcome to the community @Aps2212

Aps2212:

response = openai.Completion.create(
    engine = "text-davinci-002",
    prompt=f"write a product description on the following on this product: {name} {producer} {description} {measurements}",
    max_tokens=1000
  )

Why are you using the text-davinci-002 when you already have a fine-tuned model on your data?

Also the model you fine-tuned exists on the chat completions API and you’re using completions API in your script.

Aps2212 · September 30, 2023, 4:33pm

I’m quite new to this, I’m sorry if I have made an obvious mistake, what would you suggest I do instead?

Thank you for your reply

Foxalabs · September 30, 2023, 4:35pm

My bad for not spotting that at the start Ahh, don’t feel bad, I also missed it!

meant in reply to Aps2212

sps · September 30, 2023, 4:36pm

First read the docs.

Then, use this model with the boilerplate code in the docs shared in my last reply with your prompt.

sps · September 30, 2023, 4:38pm

No worries @Foxalabs

It happens to the best of us.

Aps2212 · September 30, 2023, 4:58pm

Thank you, it seems that I fixed the issue based on your advice, I changed the code so it uses chat completion, and my fine-tuning model instead, and now it works fine.

Thank you both for your help @Foxalabs @sps

Topic		Replies	Views
Struggling with poor performance on fine-tuned davinci model API	15	2264	December 20, 2023
Major difference in output: playground vs API API api , playground , prompt	10	2335	April 15, 2024
Playground and API returing different results? API	7	937	December 6, 2023
Finetune model completion cut off too short Prompting	7	3493	January 17, 2023
Not getting results I'd expect from API. Any advice? API	5	2693	March 1, 2023

Different output from api and playground Fine-tuning

Related Topics