How to optimize API request in terms of expenses

apris · May 7, 2023, 5:56am

Hello! We use chat GPT for German grammar fix. The source code attached below. However it is a little bit too expensive. Is there any way to optimize expenses? Maybe change model or another parameters? So that the quality does not decrease significantly. In doing so, we can sacrifice processing speed. Actually which parameters can we experiment with to impact the price? Maybe change the model?


    const response = await openai.createCompletion({
      model: "text-davinci-003",
      prompt: 'Korrigiere diesen Text auf Grammatik: "' + text + '"',
      temperature: 0,
      max_tokens: 1024,
      top_p: 1,
      frequency_penalty: 0,
      presence_penalty: 0,
    });

Bitcoin · May 7, 2023, 7:40am

You can lightly bias with a system prompt and a strong user prompt with gpt-3.5-turbo and your costs can get cut substantially.

gpt-3.5-turbo does not have the accuracy of Davinci but for most use cases it is more than enough to get the job done even in production settings.

sps · May 7, 2023, 10:54am

Hi @apris

gpt-3.5-turbo is 10x cheaper compared to text-davinci-003 and is faster as well.

Also since the tokenizer is based on English, German costs higher because of the higher number of tokens required.

I’d recommend reading Chat Completion docs before you proceed.

apris · May 8, 2023, 6:41am

Thank you for great advice! However can you clarify what you mean by “system prompt” and “strong user prompt”? Or where could I read about them? Thank you in advance!

apris · May 8, 2023, 6:42am

Thank you! Maybe you can also specify, what should I search for in “chat completion docs”?

sandrat · May 8, 2023, 9:13pm

i think he is referring to the role messages you would use calling the chat completion endpoint

openai.ChatCompletion.create(
  model="gpt-3.5-turbo",
  messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who won the world series in 2020?"},
        {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
        {"role": "user", "content": "Where was it played?"}
    ]
)

Read more about it here https://platform.openai.com/docs/guides/chat/introduction

under chat completions read about the roles, useful to prepare context for model , 3.5 doesn’tfollow system messages to well currently, if i remember correctly. But they are aligning for it to have more impact so expect 4.2 descendants to be good at following that message
i.e. coming soon = worth learning

apris · May 10, 2023, 7:40am

Thank you for help. Just what I still couldn’t get:

does it make sense for gpt-3.5-turbo model? “3.5 doesn’tfollow system messages to well currently”
which exactly I should add to system or user prompts to decrease expenses for my query?

p.s. sorry for too many questions(

sandrat · May 10, 2023, 5:39pm

my apologies I answered your question in the wrong context,

I answered purely to point you in the right direction of the bit I saw.

Reading where you started from.

The guy pointed you at the tech resource and I think he was trying to point out that you were missing some technical fundamental

Which would be necessary to understand any recommendation, what I mean by this is that there are programing approached when interfacing to api endpoint tha can be used to limit the number of calls/token and given that we are still quite early in the tech evolutionof these technologies – wee don’t have the robust tooling that we find in other more mature spaces . So ealy adopters are on the bleeding edge and what do they do – they bleed the TLDR of that whole lot was their isn’t an easy solution to the question you asked.

After that wee need to clear up the the specific comment

I didn’t get context on the the response to you the roles + messages , I tjought you needed to know where to find the docs and that’s what my answer was worth - zero. T

he reply to you implied using those messages , to contain costs - I’m not certain what he was referring to and would love to understand actually( ?), I re3ad it as you use those messages/roles to focus the model and hence minimize inefficiency buy I would love to know if tht wasn’t whqt was meant.

There is one very obvious thing that is not too technical and that is the costs associated to each model. I f you use the models appropriately i.e. use the smallest model that is capable for your task. Don’t use Davinci capable for arbitray tasks – say a relatively simple sentiment analysis task – use curie, in fact try the other models as curiie is very capable and faster than davinci and ONE TENTH THE COST.

Gpt-4 DaVinci is incredibly good a certain tasks but if you are trying to create something similar to chatGPT its going to costr and hurt.

So

Look at the model pricing and understand the differences between the models so that you apply the correct3engine for the task, don’t put a V6 on a moped – very fuel inefficient . ChatGPT i.e. 3.5 turbo is 0.1 cost of davinci. But there is a catch with that, it kind of had to be .because the reason ChatGPT can hold the conversation is because you send the whole conversation in each response, the whole conversation! Which brings us to the next point,

(btw Now the charge is per 1k token,s a token is similar to word but it isn’t a word, I use a factor of 0.6 i.e. see the cost as per 600 words,)

Technically you need to understand how token are consumed given the different use cases. So first thing – you pay for all tokens consumed in the prompt AND you pay for all tokens generated in the response so:

You ask for a business tagline and some copy – the prompt is 300 words – you have consumed 300=+ tokens

It generates 500 words – you have consumed 500+ tokens

Together that has cost 800+ tokens

Make sense?

Now that is for a zero shot request – you put the the request out it responds – it serves you to request a number of variations – so 3 versions of the copy, as its cheaper that going back and forward with the same request. In fact this training couldhelp you:

ChatGPT Prompt Engineering for Developers - DeepLearning.AI

The catch to the above is to support a chat , you send older comments in the conversation in with the request so that the model has the context of the conversation . that’s a lot of tokens and if you are chatting to davinci – it’s a waste in most cases.

So:

Use the correct model forr the task ,
Learn to prompt to get what you want efficiently.
Consider your use case – do you need to have an entire conversation to generate an email

(aside I heard some idiot said anybody could have done ChatGPT before openAi, technically correct but, it would be good if that individual did not assume it didn’t come from the channel because of lack of imagination but rather lack BIG BLUE funding, we don’t have blank token cheques. and if you think you know who I mean – you do. Farkin eejit

Hope that helps, if not im happy to jump on call for a few mins to understand what you up to but I’m no guru or anything.

Keep well.

Topic		Replies	Views
How to improvement my app to use less tokens Community gpt-4 , api	4	7250	July 8, 2024
Cost of API vs chatGPT API	21	25788	December 13, 2023
Reducing token usage while hinting LLM as it generates API gpt-4 , gpt-35-turbo , chatgpt , fine-tuning , api	5	3140	October 25, 2023
I have some questions regarding how billing is done with GPT-4-Turbo using the Chat.Completion API	7	1215	December 26, 2023
Retain past responses in memory without sending them again at every API request API gpt-4 , gpt-35-turbo , chatgpt	11	9805	January 25, 2024

How to optimize API request in terms of expenses

Related topics