My task is to extract certain facts from user-supplied text and return them in a structured format. I’ve written code that uses the assistants api to accomplish this. My code is meant to run in batch-like mode (no end-user interaction). Presently the GPT is not working as well as I’d like, so I’m using the assistants playground to improve accuracy.
My process is: In the assistants playground, I tweak the instructions, enter my input text and examine the model’s response. I’ve tried working with both GPT 3.5 and GPT 4. If I can get this working in GPT 3.5, there’s a significant cost savings.
There is a significant difference between how I interact with GPT 4 vs GPT 3.5 in the assistants playground:
-
Using GPT 4, after I enter the initial user message and receive the initial assistant message, I can have a back-and-forth follow-up interaction with the assistant to ask it about its thinking and point out the mistakes that it made. Then the GPT will realize the mistake and I can ask it how I should change the instructions so that it doesn’t make the same mistake again. If I want to provide new input text, I tell the model “let’s try a new input” and it happily processes it.
-
Using GPT 3.5, the model tries to extract facts from every user message. (whereas GPT 4 tries to extract only from the first user message). So I can’t have a back-and-forth to try to figure out what this model is thinking.
Can someone shed some light on (1) why this behavior is different and (2) whether I can control this behavior and if so (3) how?
My assistant instructions:
You are an assistant that extracts information about events from my input. There can be any number of events in my input. You will extract these fields from each event: description of the event, start time and end time. Your response to my input should contain the event information in structured json output without using markdown code blocks.
When the input contains time zone information after the timestamp, please include it in the output.
When the input contains a pair of timestamps next to each other separated by a dash or minus sign or hyphen, these timestamps are the start time of the event and end time of the event.
When the input contains a pair of timestamps next to each other separated by a pipe symbol or a forward slash character, the first timestamp is the start time and the second timestamp should be ignored.
When the input contains multiple timestamps separated only by spaces, the first timestamp should be treated as the start time and ignore the other timestamps when they aren’t clearly specified to represent a different event or an end time in the text.
My initial user message contents:
I woke up at 6:00 AM EST / 11 AM GMT. I brushed my teeth 6:15 AM EDT - 6:20 AM EDT. At 6:30 AM GMT I started breakfast. The alarm went off at 6:25 AM EDT 9:25 AM PDT. I turned off the oven at 10:00 AM EDT | 3:00 PM GMT.