Do I misunderstand retrieval pricing?

p.grigorii01 · January 3, 2024, 4:57pm

So, I was looking for ways to reduce the cost of my OpenAI integration and found the data retrieval option. I am considering to put all the questions and data into a markdown file and ask a GPT assistant to give me a JSON with every answer. Then the file and assistant are immediately disposed of to avoid getting charged for more days of storage. I tested this, and it seems to produce good results reliably.

The cost seems to be minimal since the document’s contents aren’t counted as tokens (Or are they? That’d explain a lot). The only cost it’d incur is the file itself, at 20 cent per GB (~167M tokens), plus a negligibly short request along the lines of “answer every question in the file and return a JSON”. That is roughly equivalent to 8.35M tokens for 1 cent, 8350 times cheaper than 1 cent / 1K tokens for conventional requests.

I am finding this hard to believe because if it were so, why would anyone ever send conventional requests instead of just using assistants+retrieval? Or, for that matter, why wouldn’t OpenAI just make that happen automatically with long enough Chat Completion requests, reducing the cost drastically? What am I missing?

Thanks in advance!

Foxalabs · January 3, 2024, 5:13pm

Hi and welcome to the Developer Forum!

What you are describing is the use of “embeddings” these are pre processed chunks of text that can be intelligently searched for and retrieved for context generation. This uses a semantic similarity search, not a traditional word based search and so is very powerful at pulling back data related to the question being asked.

In its simplest form you would “embed” the users query and then see which of your pre-stored pre processed embeddings from your documentation have the closest similarity and then pull back, lets say the top 5, of those to include as context and then append a prompt like "given the above context, how would you answer the following: “{users_question}”.

You still pay for the tokens used as context but not every token stored as embeddings. So you can have megabytes of documentation and only use kilobytes of context.

OpenAI includes this functionality as part of the assistants system, but it is currently in very early testing and so can still generate quite large contexts, this will hopefully be addressed soon. There are also external solutions for this including ChromaDB, Weaviate and Pinecone to name but a few.

You can read about it here
https://platform.openai.com/docs/guides/embeddings

p.grigorii01 · January 3, 2024, 5:27pm

Thank you for the answer, I’ll look into embeddings.

Topic		Replies	Views
Assistants API Retrieval Pricing: how much does this cost? API assistants , assistants-api , assistants-pricing	42	17683	March 8, 2024
Cost when building chat with text with embeddings and chatgpt 4-128k API embeddings , gpt-4 , chatgpt	6	4359	December 22, 2023
File Search pricing (retreive the docs info) API pricing	4	1233	June 5, 2024
How does pricing for Assistant Retrieval work? API api , pricing , assistants , assistants-api , assistants-pricing	11	16716	April 5, 2024
Calculating embeddings costs API	8	9821	September 5, 2023

Do I misunderstand retrieval pricing?

Related topics