Is there a way to use the Completions API to replicate the Search API but using a fine-tuned model (i.e. does anyone know what the Search endpoint’s prompt is)? Or will the Search endpoint support fine-tuned models soon?
Here’s the context in case there is a better way that I could approach the problem:
Our product is serving product teams at business-to-business (B2B) companies (starting with SaaS companies).
We want to connect a company’s product roadmap to customer conversations. The end goal is being able to say “this feature was critical to this customer buying this product” even for features that aren’t a separate product (or the inverse, this customer won’t buy our product because we don’t have feature XYZ). We think this is possible today because of technologies like Gong, which record and transcribe all customer conversations.
Unlike in an “Answers” use case, completeness is important and the value of completeness is linear. If 10 customers discuss the need for a feature, it is twice as good if we are able to identify 8 of those customers vs just 4.
Domain expertise is critical for this use case because customers and team members often refer to the same feature/capability in a number of different ways:
- They could describe the feature technically
- They could use the marketing terms for it
- They could lay out the pain point the feature solves
- They could explain their use case that the feature addresses
All of this information is captured in a company’s knowledge base (KB) in some form (or their external docs).
The order of magnitude of data generated is too large to run everything through the API (both from a time and cost perspective). A mid-sized (~500 person) B2B SaaS company may generate over 400 hrs of transcripts every day (that’s 32M tokens generated a month). This would perhaps be feasible as a one-time classification task, but we want to be able to offer real-time search and ongoing tracking of features.
Since these features may be talked about in a myriad of ways without the feature being explicitly mentioned, using the File-based Search Endpoint hasn’t yielded adequate performance because it uses keyword search to winnow down the results. Thus, we’ve added a step (Step 1 below) to improve the list of initial documents that are then fed into the Semantic Search.
- The model is fine-tuned on the KB through the method described as “Case study: Creating an expert model in the legal domain which understands internal company jargon” on this page of OpenAI documentation.
– Fine-tuning with
- When a search occurs (let’s say “Single Sign-On”), we first ask the fine-tuned babbage model to create keywords from the search phrase. Setting Temp=0 and TopP=1 has yielded excellent domain-specific keywords with the fine-tuned model (i.e. the equivalent of returning “authentication” for the search “Single Sign-On” but for a company’s specific terminology).
a. The prompt we use is “Extract technical and customer-facing keywords from the following text.” along with 3 examples.
- We run a keyword search (not using OpenAI for it) and get documents that have matched any of the generated keywords or original search phrase.
- We call the Search endpoint with a standard
babbagemodel (have also experimented with
curiebut haven’t seen much difference). We use the original search phrase and pass in the matched documents from Step 2.
- Sort them by rank (after filtering out Rank < 200) and return the results.
The Primary Problem
Step 3 has a tendency to miss documents that should match (or rank these documents low), specifically when the document is talking about the search topic in an indirect, domain-specific way (e.g. doesn’t explicitly use the search phrase).
Given the domain-specific performance I’ve seen from the fine-tuned babbage model, I would love to be able to use it in the Search step.
Since fine-tuned models can only be used for Completions, is there any way to replicate Search through the Completions API?
For Step 2, I’ve considered fine-tuning a BERT-based semantic similarity model on the KB (as mentioned in this post). That way, we wouldn’t need Step 1 and the initial filtering would hopefully be more complete. It would also be possible to just perform Step 2 (and skip Step 3) if the model was accurate enough (although I’ve used a non-fine-tuned RoBERTa in the past for a different problem and it didn’t seem as effective as GPT-3). I would prefer to stick with GPT-3 where possible given its power in understanding text.
I’ve considered treating this as a classification task (if we just scoped the “search terms” to a set of features). However, getting the examples necessary for training the classifier would not be scalable, and the feature set of a roadmap changes enough that classification would be run constantly (which then makes it computationally similar to search).