Reducing Cost of GPT 4 by using embeddings

Hi all,
I need help with reducing my costs.
I am trying to use GPT models for generating taxonomies. Now inputs are product Titles, and Descriptions.
Till now I am getting best results with GPT4, but right now we can’t finetune it. Kindly correct me, if I am wrong…
With GPT3-Davinci, I get somewhat good result after finetuning, but I have around 1.5 million products, so finetuning on all of them will cost me more than 30000USD.
One way, which I recently tried, is to use embeddings to save the index, and then use that to send better prompt. Is this method good? Will i save cost If I save index for all products?

Kindly help me with this…



It is if you are getting good results in your tests. Are you using a test protocol to see how well your embeddings examples are performing?

I think embeddings are about 1/60th the cost of alternative OpenAI approaches.

Perhaps I’m dreaming, but isn’t there a way to first classify the products data using existing [known] keywords such that an embeddings solution will function as a meta-inference that gets inquirers into the ballpark, and then the last mile is served up by semantic or an inverted search index that would avoid mapping every product through the LLM?


For your use case, definitely use embeddings. If you fine tune every time you have new data you will have to retrain which is not efficient. You can use langchain or Haystack with a vector database. Will be much cheaper with same results if done correctly.


Fine tunning is not meant to inject knowledge into the model, it’s only meant for the model to learn patterns on how to respond to certain types of question (how, but now what). Embeddings is the only reliable (and yes, cost-effective) to insert any kind of domain-specific knowledge. You could even use a different product to do the vectorization (or any kind of semantic search) and save even that. Retrieval of relevant information is one thing (be it via OpenAI’s embeddings or other embeddings or search engines) and calling the LLM with the retrieved information as part of the prompt, is a different step altogether. Hope this helps clarify.


Indeed, and we need to be careful of this wording. Embeddings don’t really insert anything into an LLM either; rather, it allows us to externally develop and manage similarity pointers so that when compared to other pointers, we can say these things are alike in a relative sense. The best read on this topic is …


Thanks @bill.french for helping me out.
But I am still confused about how to create embeddings. I used following piece of code(in image),
But I think, for every query it is attaching the whole embedding(which perhaps in my case is only 1).

I have data in csvs, where I have product Title, Description etc.
Now, currently we have around 3000 different Taxonomies, which are multi level. Some Taxonomies can have as low data as only 1 row, and can have several thousands as well.
So should I create one large text file from all rows, as you can see in image, which I done previously

Or I have to create separate text files for every taxonomy, and then create index?

How will it work?

I bet you don’t need fine-tuning to accomplish what you need, but it sounds like you already tried that, so I don’t want to assume, but I’ve had great success in complex mapping of AICP bids (really nasty multi-indexed tables and awkward taxonomical inconsistencies and it seems to work fine without fine-tuning. YMMV? Otherwise, you’d probably have good luck putting your embeddings into an index, as mentioned, and querying them with low threshold.

1 Like

Thanks @aledandrea for the explanation.
For example I have to query:
What is taxonomy of product with Title:“XYZ” and description:“WYZ”? Now how using OPENAI, I can most effectively provide relevant information in form of embeddings?

I think, I mistakenly created embeddings by putting all text data in one text document, which I should have done by giving separate documents for different taxonomies…

Yes @ZAdam I tried finetuning with a small subset of data, but I am getting around 30 % better results from GPT4, without fine-tuning, as compared to GPT3-Davinci fine-tuned one.
Can you refer me some resource which i can use to make best use of embeddings, as it seems you have got great results, this way…

Thanks a lot for your help…

Also, can we pass embedding based prompt to GPT4? If so, how…

OpenAI doesn’t offer the ability to ingest embeddings and do anything with them. They suggest other companies like Pinecone I think, that will store them for you and do calculations. OpenAI only lets us send a string, and receive back an embedding (array of floats)

LlamaIndex is a good resource:

Ultimately, you’ll need to use a vector DB like Pinecone to store the embeddings. It’s trivially simple to store and query…

# store docs to an index
from llama_index import GPTSimpleVectorIndex

index = GPTSimpleVectorIndex([])
for doc in documents:

Query the index

response = index.query("What did the author do growing up?")

That’s the most simple case. Store your taxonomy mappings as a simple store of documents, then you pass in a few hundred need-to-process lines and the first query you make is to the index to get only the needed taxonomies, using a lower threshold (you can finetune the threshold to query)., and finally, you feed those into GPT4 api. No need to send a full taxonomy into the model each time; only the matching taxonomies.


Thanks @ZAdam
So for this first of all, I have to create separate documents (text files) for each taxonomy. And then create an index for all those documents. Only that way, I will be able to do: No need to send a full taxonomy into the model each time; only the matching taxonomies.

That’s the entire premise of the use around its use with LLMs, yes. It stores your text as the embedding values themselves, and then pinecone (for example) uses a simple cosine similarity function to see how similar your query is with the database and retrieves the most relevant objects.

You can also store elements within a document… and index of indexes… like a library with books that themselves have indices. It’s up to you how to do that… probably batch like-taxonomies?

Outside of vectors indexation, consider this:

GPT is not just useful for the explicit task you need; you can also call it multiple times in “the background.” So, maybe you decide to batch feed 1000 rows in for taxonomy mapping… first, you might serve GPT the whole list of dirty rows, and have it respond with the pandas indexes of the taxonomies you’ll need; then feed it some condensed or summarized index of your taxonomies. Maybe even ask GPT to help to structure that summary. 1 query to GPT to get you the indexes you need for the batch; a 2nd query to GPT to do the heavy lifting with mapping.

1 Like

Thanks @ZAdam
But it is quite complex, and I am not able to grasp much. I will explore this library and will try to understand it to the best. I don’t want to tease you further :slight_smile:
I will come back here after doing some research, and experimentation. I am very grateful for your help.

If you have/will come across any helping blogs/videos, kindly do share with me.


Personally I think starting with a service like Pinecone just adds another layer of complexity on top while you are trying to learn. I think, you should start just on your PC and some .txt files.

Here is what I would do
Bearings.txt contains the embedding data for the string “bearings”
Chains and sprockets.txt contains the embedding data for the string “chains and sprockets”

Okay, now take your product,
TIMKEN SAD 528 SRB Pillow Block Housing Only.txt contains the emebedding for that term

Calcualte the cosine similarity between the data in bearings.txt and the data in TIMKEN.txt
Also, Calculate the cosine similarity between chains and sprockets.txt and TIMKEN.txt
Hopefully, the score will be higher for the first one. So now you know your TIMKEN SAF 528 Pillow Block Housing probably belongs in the Bearings category.

Now do calculations against the sub categories of Bearings, like Mounted bearings, to find out which type of bearing the product belongs to. Which ever of those score highest, calculate against its sub categories too (Mounted Roller Bearings) and so on until you have drilled down completely and there are no sub categories left. That would be one approach to finding what taxonomy a product should belong to.

A different way to do it, could be instead of checking top categories, then its child categories, instead do it one shot like this-
Check TIMKEN.txt against
Bearings, Mounted Bearings, Mounted Roller Bearings, Housings.txt
Chains and sprockets, Sprockets, Roller Chain Sprockets, Finished Bore Sprockets.txt

So rather than drilling down, you just compare against each fine-grained category in turn. I think the second would give better results and it would be simpler


That’s not even remotely like my embeddings code (this is Google Apps Script BTW, V8 ecma 5 I think).

50% of good AI solution development is all the non-AI stuff. Like database management, enumerations with precision, etc. Make sure you have a process that supports calls into OpenAI with precision, or suffer the possible consequences of higher costs.


Thanks alot @ZAdam , @bill.french , and all others.
I am getting awesome results using embeddings, and also my cost has gone way down. I started with langchain, but then used from-scratch embeddings workflow as that gives more control…

Thanks all of you, again…


Thanks for coming back to let us know. Glad you got it all sorted.

1 Like