Hi all,
For a research project, I am investigating whether GPT-3 can be used for to-do detection. I have several transcripts that contain tasks, and I want to extract them automatically. Because these transcripts are too long for GPT-3, I split them into smaller sub-transcripts. For all sub-transcripts, I have written down if they contain any to-dos, and if so, what to-dos and who is assigned to them.
To detect the tasks, I have come up with a small pipeline:
- Ask GPT-3 if the conversation contains any to-dos
- If the answer is yes, ask GPT-3 to write down the to-dos and who is assigned to them.
If I skip step 1 and only work with the second prompt, GPT-3 notes many irrelevant or incorrect to-dos.
The prompts look like the following:
Step 1:
Microphone RED: …
Microphone BLUE: …
Microphone RED: …
Microphone RED: …
Microphone YELLOW: …
Microphone YELLOW: …
Microphone BLUE: …
Microphone BLUE: …
Microphone YELLOW: …
Microphone BLUE: …
Microphone RED: …
Microphone BLUE: …
Microphone YELLOW: …
Microphone RED: …
Microphone RED: …
Microphone RED: …
Microphone YELLOW: …
Microphone YELLOW: …
Microphone RED: …
Microphone BLUE: …
Microphone RED: …
Microphone RED: …
Microphone RED: …
Microphone YELLOW: …
Microphone YELLOW: …
Microphone BLUE: …
Microphone BLUE: …
Microphone BLUE: …
Microphone YELLOW: …
Microphone RED: …
Question: Does this conversation contain any explicit to-dos for Blue, Yellow, or Red?
Answer (yes/no):
Microphone RED: …
Microphone BLUE: …
Microphone RED: …
Microphone RED: …
Microphone YELLOW: …
Microphone YELLOW: …
Microphone BLUE: …
Microphone BLUE: …
Microphone YELLOW: …
Microphone BLUE: …
Microphone RED: …
Microphone BLUE: …
Microphone YELLOW: …
Microphone RED: …
Microphone RED: …
Microphone RED: …
Microphone YELLOW: …
Microphone YELLOW: …
Microphone RED: …
Microphone BLUE: …
Microphone RED: …
Microphone RED: …
Microphone RED: …
Microphone YELLOW: …
Microphone YELLOW: …
Microphone BLUE: …
Microphone BLUE: …
Microphone BLUE: …
Microphone YELLOW: …
Microphone RED: …
Write down the tasks and who has to do them behind the task in brackets.
Now I have some questions about this project:
- Am I asking the right prompt for step 1 or do you maybe have a better suggestion?
- I want to see if finetuning improves the performance, especially for step 1. I finetuned a model with 91 prompts (I realize this is a small dataset but I just wanted to see if whether already changes the performance). However, when I finetuned the model it gave a strange output. With the regular GPT-3 model, I receive either yes or no as an answer. It performed OK, with an accuracy of 0.62. (It was also consistent, giving the same response when running it 5 times). With my fine-tuned models, the output was no longer either yes or no, but looked more like this:
yes yes no no no no no no yes no yes yes yes yes
I have no idea what is going wrong here, because in the prompts I used to fine-tune the model I only had either yes or no as my ideal output. Does anyone have an idea what I did wrong? Or is GPT-3 simply not able to answer this question?
Thank you!