Contextualizing completions: fine-tuning vs. dynamic prompt engineering using embeddings

Hi everyone and apologies for the long post. Just trying to give as much info as possible.

A little background on what I’m trying to do: I would like to generate completions based on the context of a specific project the company is working on. For example, say the company is working on multiple software development projects. Each project has its own set of artifacts (e.g., requirements, project schedule, documentation, user-generated content, etc.). If a developer on Project XYZ asks, for example, when the next release is scheduled for or what the description of a feature is, I would like the completions to be based on the context of Project XYZ’s artifacts—i.e., Q&A-style completions. I would also like to generate creative, open-ended completions based on Project XYZ’s context.

Following the “Question Answering using Embeddings” sample cookbook, I was able to get excellent results for Q&A-style questions. However, I have a couple of concerns with this approach:

  • Cost can become a factor since I would need to get embeddings for every possible context, and use up a lot of tokens for every completion by dynamically adding context to each prompt

  • While providing concise context for Q&A-style completions—when the answer is more deterministic—is feasible, it is more challenging for creative, open-ended questions since they require more context, further adding cost concerns and sometimes running up against the prompt size limit

I was hoping I could achieve reasonably strong contextualization using fine-tuning to avoid having to provide explicit project context in the prompt—and playing with temperature for Q&A-style vs creative completions. I tried a bunch of different fine-tuning data sets (e.g., open-ended no-prompt completions, tagged text, synthetic variations on the same text for reinforcement) but unfortunately couldn’t get good results—the model either broke (e.g., kept repeating the same word/phrase) or would provide non-contextualized completions.


The fine-tuning data set isn’t exactly large, so that might be the issue, but strong contextualization with fine-tuning alone might just be unfeasible. I’m curious what others in the community think on the subject of effective and efficient completion contextualization.

Many thanks!

1 Like

My experience has been that the Q&A using embeddings approach is (much) better than fine tuning for the kind of task you are doing. You only need to get the embeddings for the contextual information once. At completion time, you just obtain the embedding for the query. Of course, if your contextual data is being updated often, you’d have to keep your embeddings up-to-date too, but I believe this can be done by adding the new embeddings to your existing ones, i.e. I don’t think you have to “re-embed” text for which you previously obtained embeddings.

Thanks for sharing, @lmccallum. I think that’s going to end up being it, really, but figured it doesn’t hurt to ask the community :slight_smile:

Thanks again.

You can use a free, open source model to get embeddings. As for completions, it’s possible that you could fine-tune Curie on 400 perfect completions. However, my hunch is that you will need to stick with DaVinci instruct.

Thanks for the suggestions, @jhsmith12345. I’ll look into open source embeddings options.

1 Like

You can also modify the logit_bias of the requests you send.

For example, use topic analysis and send those words with +2 values each (tokenized). This has downsides in that the model will become less grammatical, but it will be on topic consistently. This is similar to changing presence/frequency bias towards negative values.

“I would need to get embeddings for every possible context” - I guess you are not generating the embeddings using openai every time you submit a request?

I generate the embeddings once for the source document (in chunks), and for every query/prompt thereafter to identify which source chunks to add to the prompt to provide the context.

Ok, that’s what I thought. May be for the retrieval / embeddings part you could use huggingface models, like sentence transformers or DPR (Dense Passage Retrieval). And instead of sending the whole context you could somehow “copress” / summarize the context (also using open source models) where you have only important entities, keywords there → could reduce token length.