Contextualizing completions: fine-tuning vs. dynamic prompt engineering using embeddings

sarilouis · September 12, 2022, 9:34pm

Hi everyone and apologies for the long post. Just trying to give as much info as possible.

A little background on what I’m trying to do: I would like to generate completions based on the context of a specific project the company is working on. For example, say the company is working on multiple software development projects. Each project has its own set of artifacts (e.g., requirements, project schedule, documentation, user-generated content, etc.). If a developer on Project XYZ asks, for example, when the next release is scheduled for or what the description of a feature is, I would like the completions to be based on the context of Project XYZ’s artifacts—i.e., Q&A-style completions. I would also like to generate creative, open-ended completions based on Project XYZ’s context.

Following the “Question Answering using Embeddings” sample cookbook, I was able to get excellent results for Q&A-style questions. However, I have a couple of concerns with this approach:

Cost can become a factor since I would need to get embeddings for every possible context, and use up a lot of tokens for every completion by dynamically adding context to each prompt
While providing concise context for Q&A-style completions—when the answer is more deterministic—is feasible, it is more challenging for creative, open-ended questions since they require more context, further adding cost concerns and sometimes running up against the prompt size limit

I was hoping I could achieve reasonably strong contextualization using fine-tuning to avoid having to provide explicit project context in the prompt—and playing with temperature for Q&A-style vs creative completions. I tried a bunch of different fine-tuning data sets (e.g., open-ended no-prompt completions, tagged text, synthetic variations on the same text for reinforcement) but unfortunately couldn’t get good results—the model either broke (e.g., kept repeating the same word/phrase) or would provide non-contextualized completions.

The fine-tuning data set isn’t exactly large, so that might be the issue, but strong contextualization with fine-tuning alone might just be unfeasible. I’m curious what others in the community think on the subject of effective and efficient completion contextualization.

Many thanks!

lmccallum · September 12, 2022, 10:15pm

My experience has been that the Q&A using embeddings approach is (much) better than fine tuning for the kind of task you are doing. You only need to get the embeddings for the contextual information once. At completion time, you just obtain the embedding for the query. Of course, if your contextual data is being updated often, you’d have to keep your embeddings up-to-date too, but I believe this can be done by adding the new embeddings to your existing ones, i.e. I don’t think you have to “re-embed” text for which you previously obtained embeddings.

sarilouis · September 13, 2022, 7:53pm

Thanks for sharing, @lmccallum. I think that’s going to end up being it, really, but figured it doesn’t hurt to ask the community

Thanks again.

jhsmith12345 · September 14, 2022, 9:41am

You can use a free, open source model to get embeddings. As for completions, it’s possible that you could fine-tune Curie on 400 perfect completions. However, my hunch is that you will need to stick with DaVinci instruct.

sarilouis · September 14, 2022, 5:15pm

Thanks for the suggestions, @jhsmith12345. I’ll look into open source embeddings options.

james1 · September 19, 2022, 2:37pm

You can also modify the logit_bias of the requests you send.

For example, use topic analysis and send those words with +2 values each (tokenized). This has downsides in that the model will become less grammatical, but it will be on topic consistently. This is similar to changing presence/frequency bias towards negative values.

arben.sabani · September 20, 2022, 4:36pm

“I would need to get embeddings for every possible context” - I guess you are not generating the embeddings using openai every time you submit a request?

sarilouis · September 20, 2022, 4:58pm

I generate the embeddings once for the source document (in chunks), and for every query/prompt thereafter to identify which source chunks to add to the prompt to provide the context.

arben.sabani · September 20, 2022, 6:15pm

Ok, that’s what I thought. May be for the retrieval / embeddings part you could use huggingface models, like sentence transformers or DPR (Dense Passage Retrieval). And instead of sending the whole context you could somehow “copress” / summarize the context (also using open source models) where you have only important entities, keywords there → could reduce token length.

Topic		Replies	Views
Pseudo fine-tuning chat completions... best practices? Prompting gpt-4	4	997	December 24, 2023
Use-case: Performance review assistant. Is embeddings, fine-tuning, or in-line prompt engineering the best way to provide context? Prompting	1	793	December 17, 2023
Fine-tuned davinci - prompt/completion - terrible responses Prompting	8	2554	December 24, 2023
What's better for the type of chatbot I am building? Fine tune or embedding? Community chatgpt , api	10	2259	August 20, 2023
Best method of injecting relatively large amount of context to be leveraged in a response API	10	11878	December 17, 2023

Contextualizing completions: fine-tuning vs. dynamic prompt engineering using embeddings

Related topics