Reducing Cost of GPT 4 by using embeddings

LlamaIndex is a good resource:

https://gpt-index.readthedocs.io/en/latest/guides/primer/usage_pattern.html

Ultimately, you’ll need to use a vector DB like Pinecone to store the embeddings. It’s trivially simple to store and query…

# store docs to an index
from llama_index import GPTSimpleVectorIndex

index = GPTSimpleVectorIndex([])
for doc in documents:
    index.insert(doc)

Query the index

response = index.query("What did the author do growing up?")
print(response)

That’s the most simple case. Store your taxonomy mappings as a simple store of documents, then you pass in a few hundred need-to-process lines and the first query you make is to the index to get only the needed taxonomies, using a lower threshold (you can finetune the threshold to query)., and finally, you feed those into GPT4 api. No need to send a full taxonomy into the model each time; only the matching taxonomies.

3 Likes