Fine-tuning with contextual embeddings

dr.rsvidhya · May 7, 2023, 8:48am

I am trying to create an application to analyze a given legal contract and answer a standard set of questions (like effective date, contracting entity, etc.).
I was able to do this to some extent using the below approach:
Break doc to chunks => Create embedding vectors => Find chunk with closest embedding to given query embedding => Text completion API with that chunk as context

But the main problem is that the answers are approximately ok, but not exactly what I want. I would ideally want to train the model with some examples from a sample set of documents.
Like this
“Document A, Question 1, Answer 1, Question 2, Answer 2;
Document B, Question 1, Answer 1, Question 2, Answer 2…”

But the fine-tuning API does not allow to pass any context in the training examples. How do I accomplish this?

wfhbrian · May 7, 2023, 12:18pm

Based on your post, jumping to fine-tuning is premature.

You can improve your chunking strategy. Smaller chunks. “Smarter” chunks (injecting context during the preprocessing).
You can improve your system prompt.

And if you’re not using HyDE, you should try that, too.

bill.french · May 7, 2023, 2:17pm

Hi @dr.rsvidhya, and welcome to the community!

As @wfhbrian mentioned, fine-tuning may not be needed for your objectives yet. I think your process looks something like this:

In contrast, consider this architecture that uses a learner shot to provide more answer clarity for the completion.

I’ve had pretty good success with this approach.

Topic		Replies	Views
Fine-tuning a model without using prompt-completion API fine-tuning	1	919	July 4, 2023
Fine tuning for custom corpus of data? API	0	711	January 9, 2023
Contextualizing completions: fine-tuning vs. dynamic prompt engineering using embeddings API	9	5078	December 17, 2023
What's better for the type of chatbot I am building? Fine tune or embedding? Community chatgpt , api	10	2230	August 20, 2023
Fine-tuning problem API	4	2071	December 19, 2022

Fine-tuning with contextual embeddings

Related topics