GPT 4 vs 3.5-turbo-instruct

mowe310 · October 23, 2023, 8:12am

Hello everyone,

I am currently writing a small script that should be extracting information from Excel tables using different formats. For this I am using GPT as the tables have different formats every time, and it needs some AI to connect what columns corresponds to what information.
In this task, I need GPT to follow the instructions very closely. I am telling GPT to format the output in a specific way to later extract the information into a dataframe. However, GPT sometimes breaks out the systems and skips instructions.

This is where my question comes in: What model is better at following detailed instructions: GPT4 or GPT3.5-Tubo-Instruct? Has anyone done some extended testing on this regard? I know that Turbo-Instruct is fairly new. I wonder if it even performes better than GPT4 due to its native instruct nature.

And a second question would be: How can I improve the accuracy of GPT to follow instructions? My GPT4 instructions are split into 3 parts: The System role that instructs on what to do and what format to use. A user part giving an example prompt and an assistant part giving an example answer. I tried modifying the parameters temperature and top_p. The results of this are nearly perfect, however after the 2nd or 3rd table GPT seems to become more and more inaccurate, skipping columns and rows. Even tho every table is a completely new completion call that has nothing to do with the previous.

Foxalabs · October 23, 2023, 11:21am

Hi and welcome to the Developer Forum!

GPT-4 is the better choice for instruction following, the 3.5-turbo-instruct is called that for legacy purposes and is in fact not a instruction following tuned model.

kjordan · October 23, 2023, 12:00pm

From my personal experiments, 3.5-turbo-instruct is worse at following guides than 3.5-turbo, not mentioning gpt-4.

It will be helpful if you could share some snippets of your code, so we can try to reproduce the issues.

mowe310 · October 23, 2023, 12:12pm

Thanks for the answers!

I’ve done some extensive testing now. I don’t have explicit numbers but from my tests both seem to perform similar in terms of how often they fail to follow the instructions.
However the 3.5-Turbo-Instruct is still a GPT 3 model, which hallucinates far more likely than the GPT 4 model. The Instruct model kept on making up product information. This is where I concluded on using the GPT 4 Model.

About my second question: Maybe you have an Idea on how to improve the instructions:

    system = """In this conversation, you will play the role of a data receiver. 
    You will be given data from an Excel file with different format, and you will retrieve the required information. 
    For the Data retrieving, you will be using the format of the example. You will not add any comments or explanations. 
    You will strictly follow the example format. You will not add or remove any data fields. If you are missing information on a field, just leave it empty. 
    You will always include the <break> at the end of a line. You will always include the labels from the example. You will extract all the products from the given List.
    Note that the description field may include information nessesary in other fields. Translate chineese to english. 
    """
    # in the example prompt there will be an example input which might look similar to future input
    example_prompt = """（GROUP） CORP., LTD|Unnamed: 1|Unnamed: 2|Unnamed: 3|Unnamed: 4|Unnamed: 5|Unnamed: 6|Unnamed: 7|Unnamed: 8|Unnamed: 9|Unnamed: 10
       ||||||||||Date:2023-09-21
    Picture |Item No.|Description|Material|Dimension cm|Weight G |Filling|Price  (USD/PC)|MOQ(PCS)| Individual Packing |Remarks
    |HC2301（Inquiry Nr.1:）|Hot and cold gel pack without glitter|PE & gel|15 x 23 cm|350|Gel|0.94|3000pcs|color box|1 color printing on the  product,extra set up charge for each shape is USD115.
    |total of 3 different variants+1 pouch|Hot and cold gel pack with glitter|PE & gel|14.5X13CM;15.5X8.5CM;29X12CM|170g+120g+350g|Gel|1.99|3000sets|color box|1 color printing on the  product,extra set up charge for each shape is USD115.
    TERMS & CONDITIONS:||||||||||
    1)PRICE TERM:FOB SHANGHAI.||||||||||
    2) PRICE VALID:IN 2 MONTHS (60 DAYS)|||| ||||||
    3) DELIVERY TIME: 30- 40 DAYS AFTER RECEVING CONFIRMED ORDER.||||||||||
    4)HANDLING CHARGE: USD200 PER SHIPMENT FOR ORDER AMOUNT BELOW 5000USD.||||||||||
    """

    # in the example answer is the expeted output which chatgpt should generate
    example_answer = """
    product_number|product_name|description|material|customs_tariff_number_germany|weight[g]|dimensions|MQB|price|delivery_time <break>
    HC2301|gel pack|Hot and cold gel pack without glitter|PE & gel||350g|15 x 23 cm|3000|0,68 $|30-40 days <break>
    |gel pack|Hot and cold gel pack with glitter|PE & gel||170g+120g+350g|14.5X13CM;15.5X8.5CM;29X12CM|3000|1,62|30-40 days <break>
    """

    # for csv in importCSV:
    #     print(csv)

    # Define the column labels
    column_labels = ['product_number', 'product_name', 'description', 'material', 'customs_tariff_number_germany', 'weight[g]', 'dimensions', 'MQB', 'price', 'delivery_time']

    # Create an empty DataFrame with labels
    products = pd.DataFrame(columns=column_labels)

    for csv in importCSV:
        print('########################## Request')
        print(csv)
        completion = openai.ChatCompletion.create(model="gpt-4", temperature = 0.2,  messages=[
            {"role": "system", "content": system}, \
            {"role": "user", "content": example_prompt}, \
            {"role": "assistant", "content": example_answer}, \
            {"role": "user", "content": csv}, \
            ]

Topic		Replies	Views
Recommendations for improving the new instruct model? Prompting gpt-35-turbo-instruc	3	5393	December 20, 2023
Transitioning from Codex to GPT 3.5 API codex	14	3392	January 20, 2026
GPT disregards instructions regularly GPT builders	7	2245	June 19, 2025
Why are gpt-4-preview models giving me subpar performance? Please advise API gpt-4 , gpt-35-turbo , api	5	995	March 23, 2024
Matching UI Custom Instruction Reliability with GPT-3.5 system messages in the API? API gpt-35-turbo , custom-instructions	6	1397	August 31, 2023

GPT 4 vs 3.5-turbo-instruct

Related topics