Fine-tuning with contextual embeddings

I am trying to create an application to analyze a given legal contract and answer a standard set of questions (like effective date, contracting entity, etc.).
I was able to do this to some extent using the below approach:
Break doc to chunks => Create embedding vectors => Find chunk with closest embedding to given query embedding => Text completion API with that chunk as context

But the main problem is that the answers are approximately ok, but not exactly what I want. I would ideally want to train the model with some examples from a sample set of documents.
Like this
“Document A, Question 1, Answer 1, Question 2, Answer 2;
Document B, Question 1, Answer 1, Question 2, Answer 2…”

But the fine-tuning API does not allow to pass any context in the training examples. How do I accomplish this?

1 Like

Based on your post, jumping to fine-tuning is premature.

  1. You can improve your chunking strategy. Smaller chunks. “Smarter” chunks (injecting context during the preprocessing).

  2. You can improve your system prompt.

And if you’re not using HyDE, you should try that, too.

1 Like

Hi @dr.rsvidhya, and welcome to the community!

As @wfhbrian mentioned, fine-tuning may not be needed for your objectives yet. I think your process looks something like this:

In contrast, consider this architecture that uses a learner shot to provide more answer clarity for the completion.

I’ve had pretty good success with this approach.