Reducing Cost of GPT 4 by using embeddings

One thing I’d add to this great discussion is that I’ve been seeing pretty good results for Q&A, using embeddings, with text-babbage-001 which is a fraction of the cost of gpt-3.5-turbo. If all you’re doing is question & answer, I’d recommend giving that model a try to see if it’s “good enough” for your needs.

1 Like

Having tackled a very similar problem for the past 6 months the end result we’ve arrived at (for now) is to break the chat flow into subject chunks using the AI to categorise and then backfill with database queries to grab the data. The end result is a one shot system prompt to say what the user is talking about and to provide the info up front using a final system prompt and a fake user message. Not ideal, and having the ability to embed a mass of data would be the dream ticket but it’s not available just yet.

I’m a big fan of embeddings, but often the embedding of the raw input string from a user doesn’t match well against the embeddings computed on my proprietary data. Started messing around with keyword extraction of the user query and use those keywords (embedded) to search against the embeddings. Anyone try this approach or have another approach?

Sounds like a perfect match for sparse/dense embeddings.
You can apply a weight of your choosing (70% keywords(sparse) / 30% semantics(dense) for example) or simply use it as a back-up if the match score is too low

There are also various preprocessing techniques you can apply to the raw query to enhance its effectiveness and alignment with your data.

2 Likes