i am currently working on my master’s thesis and wanted to fine tune a GPT-3 Curie chatbot.
I suceeded, but because i used Q&A style dialogues a single input provoces a total “conversation finishing answer”.
But i wanted to give it a more chatbotty feel and found a dataset, where the answer part actually asks questions back and the conversation goes on for a couple of iterations between two parties.
Because the finetuning model requires a “prompt/completion” style input thogh i don’t know how to finetune curie to keep the context of the conversation.
I hope this is in any way understandable, it’s quite the complicated problem.
This “memory” problem of GPT-3 is very common for the chatbot scenario. There’ve been quite some post on the community with potential solutions involving using GPT-3 to condense/summarize the previous conversation to retain context, so as to make the most out of it.
My hypothesis is that it can be solved with a “rolling memory” i.e. remember the most recent N tokens. I haven’t got a chance to test it out though as my grant expired in October last year.
In my last response, the potential solutions I suggested are to be used for the conversation and not for fine-tuning the model.
Yes, do this with the existing fine-tuned model you have. Though you will quickly run out of tokens with every single dialogue between user and bot. Hence the need to condense.
I’ve been using the rolling memory idea - it’s “good”, but it’s surprising how far back a conversation really goes, even with summarization, which quickly renders this process obsolete after 4 or 5 “events” from my ad-hoc experiments.
Yes that’s kinda expected, given the token limit. Another one of my hypothesis would be using embeddings, this should give significant room for memory.
Here’s how this would go:
The latest conversation is appended to an archive.
At every point the human’s message is used to search and rank N semantically similar lnes from the archive.
Then these N lines are given to completion engine as context to generate the response.
The generated response is sent back to human as the message reply.
Note that this will require figuring out the prompt(s) that get the completion engine to generate appropriate response.
I started down a similar path but abandoned it for the same reason I don’t use the /answers endpoint: too many calls (and the costs get high fast), as well as poor performance (out of /answers). Now, I haven’t tried embeddings, so there might be something to this I haven’t experimented with, but if it requires an upload of files to be the data source, that could take too long to realistically use.
I don’t know how is it taking ~30min on your end but on my end it takes like ~10 seconds to upload a file and get a response when I’m using I the /answers api. My guess is that using embeddings and doing the whole procedure, shouldn’t take more than a minute.
Also you can cut the majority of that time when the data is saved on cloud like AWS/Azure etc.
Possibly because I’m thinking of the fine tuning indexing process. File uploads are fast, though the answers API is, as I mentioned, slightly costlier.
One of the things I’ve found to be helpful is to pass the chat history so far, plus the new interaction and ask it to rewrite the new interaction to leverage the context from the chat history. This is a separate prompt/api call. Then you get a single question you can pass through the rest of your pipeline, avoid it becoming self-reinforcing, but still including the important references.