How to build a fine-tune Q/A model

mbabayev · August 6, 2023, 4:16pm

I’m building a fine-tuned Question/Answer model based on these examples:
https://github.com/openai/openai-cookbook/tree/main/examples/fine-tuned_qa
https://github.com/openai/openai-cookbook/blob/main/examples/Question_answering_using_embeddings.ipynb

If I understand it correct, the data for the model should look like that:

for related context:
{"prompt":f"{row.context}\nQuestion: {row.q}\nAnswer:", "completion":f" {row.a}"}
for unrelated context:
{"prompt":f"{row.context}\nQuestion: {another_row.q}\nAnswer:", "completion":f" No appropriate context found to answer the question"}

Is it the only data format to train a “fine-tuned” model? Are there examples for Q/A models that don’t involve embeddings?
When I’m querying the trained model do I have to find a relevant context (by using embeddings) and add it to the query?
prompt = f"{related_context}\nQuestion: {question}\nAnswer:"
Can I query the model without searching the context? I want to write only the questions like basis models do:
prompt = f"{question}"

When I’m querying my trained model I receive answers that contain multiple phrases of “Question:” and “No appropriate context found to answer the question” in the same choice. I think the chat should return only 1 sentence with such tokens. Isn’t it?

Foxalabs · August 7, 2023, 6:34am

Hi and welcome to the forum!

You seem to be using terms from two separate things, one is fine-tuning which requires data t be input in the format you showed in your post, and then there is embeddings, which are vectorised forms of text blocks typically used for storage and retrieval of large corpuses of data.

They are separate systems and while they can be used together, i.e. You can run a vector search on your query terms to build up a relevant context from your dataset to use with a call to a fine tuned model as the prompt.

You can find documentation for Embedding here

mbabayev · August 14, 2023, 2:08pm

Hello Fozabilo,
Thank you for responding.

So, by summarizing, if I train fine-tune model by these prompts:
{"prompt":f"{row.context}\nQuestion: {row.q}\nAnswer:", "completion":f" {row.a}"}
{"prompt":f"{row.context}\nQuestion: {another_row.q}\nAnswer:", "completion":f" No appropriate context found to answer the question"}
What prompt should be used for querying the model by the user?
Regarding the Embeddings part. I create embeddings from contexts but compare them using an user question. The context is a long text and user question is a short sentence. Is it correct way to use embeddings? I always receive a score below 0.3. What should I do, add more data?

Foxalabs · August 14, 2023, 3:37pm

Hi,

The user should be able to type anything, that is the point of large language models, they take messy, imprecise human input and determine what that users goal is, then present that in a (hopefully) standardised format that you can take advantage of programmatically.
Embeddings allow you to take the users query and see what (if any) of your dataset might contain similar things or concepts, then after you have that data you can pass it to the language model to use as context.

Topic		Replies	Views
What's better for the type of chatbot I am building? Fine tune or embedding? Community chatgpt , api	10	2211	August 20, 2023
I read about embeddings and I want to try it. How to start? Community embeddings , chatgpt , api	2	4766	August 11, 2023
Answers examples using Fine-tuning and embeddings Prompting	1	1043	December 17, 2023
Fine-tuning 3.5 turbo to act as conversational AI like Non-Playable Character in games API fine-tuning	4	1586	October 4, 2023
Fine tuning using a corpus API api	8	2031	July 13, 2023

How to build a fine-tune Q/A model

Related topics