How to reduce prompt tokens price

andreisiradze · April 1, 2024, 12:03pm

Hello everyone

I have a very long prompt about(90k tokens)|

The most information is about the laws and car modes

What is the best way to reduce price of each request ?
I was looking for some information in google and Open Ai documentation

Found the embeddings but I don’t quite understand how to use created embeddings and how it can help me

jr.2509 · April 1, 2024, 12:12pm

Hi and welcome to the Forum!

Could you share just a little bit more information about what you are trying to achieve? That would make it easier to provide some ideas/guidance. Thanks!

andreisiradze · April 1, 2024, 12:33pm

I’m seeking methods to lower the cost of tokens. This is a GPT assistant designed to furnish taxi drivers with information on local laws and suitable vehicles for taxi operations. However, completing a single prompt requires around 90k tokens, which proves to be quite expensive per user message. I’m exploring ways to diminish this hefty token expense. Perhaps implementing tools that fetch laws only upon user inquiry, or employing caching mechanisms could mitigate the expenditure, ensuring I don’t have to pay $0.5 per message.

Thanks In advance

jr.2509 · April 1, 2024, 12:44pm

Thanks for clarifying. If your goal is to get answers to specific questions on local laws etc. then an embeddings-based Retrieval Augmented Generation (RAG) approach might indeed be a suitable and much more cost effective solution.

In simplified terms you would need to convert the knowledge/information on the basis of which to answer questions into embedding vectors using one of OpenAI’s embedding models and then - most likely - store them in a vector database. Once that is in place, whenever you have a new query, you would convert that query too into an embedding and then perform an operation to identify the most related embeddings, i.e. the embeddings which are most likely to contain information to answer the question.

I tend to recommend this example from the OpenAI cookbook to get started with how the approach works as it walks you through the basic steps from embedding creation to how to use them in generating answer. Note this won’t give you all the answers for your use case as there are lot of nuances including in relation to aspects such as how to chunk your information prior to embedding it etc. but it should get you started.

Embeddings are a frequent topic here in the Forum so will find a lot of relevant posts through the search function.

Topic		Replies	Views
How to optimize API request in terms of expenses API	8	1332	December 17, 2023
How to improvement my app to use less tokens Community gpt-4 , api	3	2481	January 3, 2024
Optimization of large requests to GPT API chatgpt , chat-completion , assistants-api	1	856	November 24, 2023
Token limits on prompting Prompting plugin-development	4	1385	June 16, 2023
Any Suggestions to Reduce Cost and Limit Message Length of GPT 4 Turbo Model in Assistan API? API gpt-4 , gpt-35-turbo , assistants-api	1	260	March 27, 2024

How to reduce prompt tokens price

Related Topics