Using the API? Here's how you can save up to 30% and increase reliability

weinberger.ariel · May 26, 2023, 1:36pm

Hey there! After building a few tools utilizing OpenAI, I noticed a few things that many folks might not be aware of. Things that could be adding up to 30% unnecessary costs to your OpenAI usage. So, here’s what I’ve learned:

Ensure your JSON is as lean as possible: OpenAI bills per token, and that includes whitespaces and line breaks in your JSON responses. If you eliminate these extras both in sending and receiving data, you might save up to 30%! You simply need to tell OpenAI to “return JSON in a single-line without whitespaces”. Boom!
Set temperature to 0 for structured responses: When expecting a structured response (like JSON), setting the temperature parameter to 0 helps the model strictly stick to your expected JSON structure. This will prevent cases where you expect JSON, but something went wrong, and OpenAI responds with “Sorry, I am not sure I can …”.
Robots don’t need you to be polite: Computers understand simple instructions well. Trimming redundant/filler words from your prompt can not only save money but also speed up execution. Words like “please”, “kindly”, “really”, “very”, and so on, can often be dropped without losing accuracy.

Tip: You can use the OpenAI Tokenizer to count the tokens of your requests/responses.

bruce.dambrosio · May 26, 2023, 3:49pm

I have heard, but not verified, that YAML is easier for GPT to generate, AND is more compact than JSON, thus requiring fewer of those expensive output tokens.

weinberger.ariel · May 26, 2023, 6:13pm

Quite the opposite. Indentation is a core part of YAML. The larger the payload, the more redundant the spaces. You literally cannot make YAML compact because it will render it invalid. Single-line JSON or CSV are the best for cost efficiency.

Take a sample JSON object (e.g. from the Pokemon API), count the tokens using the Tokenizer. Then compare a YAML version of it VS a minified JSON version of it - JSON will be cheaper in every scenario, and the gap will increase as the object grows.

bruce.dambrosio · May 26, 2023, 6:17pm

yeah, I wondered about that.

On the other hand, if there isn’t a whole lot of indent, and the yaml is fairly flat, then we avoid the need for all the tokens for brace, colon, quote, comma, etc used in json.

fair enough, you can’t minify yaml.

PriNova · May 26, 2023, 7:49pm

Instead of using indents you can use ‘---------’ for example with YAML.

The text inputted into the tokenizer is:

---------------
-------
--
---------------

And it recognizes every line as different even with unequal characters per line.

Results in:

Tokens: 7
Characters: 42

khh3714 · October 14, 2023, 2:15pm

Ensure your JSON is as lean as possible: OpenAI bills per token, and that includes whitespaces and line breaks in your JSON responses. If you eliminate these extras both in sending and receiving data, you might save up to 30%! You simply need to tell OpenAI to “return JSON in a single-line without whitespaces”. Boom!

Is JSON you mentioned OpenAI API response JSON?

If not, what kind of JSON ?

Foxalabs · October 14, 2023, 2:39pm

Just as a heads up, the CL100K_BASE token set used in the chat models has whitespace and formatting tokens for just about every common variation going, so most of it takes a single token.

sergeliatko · October 14, 2023, 3:52pm

Also if you’re dealing with data that is often repeated, a database of hashed inputs - outputs may save you a lot…

_j · October 14, 2023, 4:29pm

The points in the OP are just plain misinformed.

Remember, BPE encoding is highly compressed. There are tokens that include newlines at the end of many common texts, including json formatting (and the tokens contain the inner dictionary or list closure also. An indent of 2 characters or 40 characters? 1 token. However, “simply tell” the AI what to produce uses tokens, along with trials to get to what you expect.

You are only billed for AI produced and seen language that is passed to the context. That does not include the JSON of sending or receiving to the API, nor even function call structure.

Temperature doesn’t make the AI produce OpenAI’s fine-tune denials.
all words have an effect. This is a ~~very~~ speculative answer. It is chopping your winding path of a prompt in half, though, that will simply serve to not confuse the AIs language processing mechanisms.

example of combined chararacters

About the only technique of benefit is to specify single-space indent, which ensures producing a token that starts with a space - the more common version found in sentences, and thus free extra semantic comprehension.

bobartig · October 17, 2023, 4:18am

I also feel like asking for JSON has a greater “tax” on the complexity you can achieve in your output. YAML appears to strike a better balance in terms of structured and semantic quality. Of course, that sort of thing takes a lot of testing.

weinberger.ariel · October 17, 2023, 4:38am

I understand your reasoning, but you can ask for JSON to be return in one-line, which you cannot do with YAML. This makes JSON much cheaper than YAML.

weinberger.ariel · October 17, 2023, 4:39am

I think you missed my point. I was referring to JSON returned in the response, as in structured data. This was posted before Function Calling was a thing.

Topic		Replies	Views
How to optimize API request in terms of expenses API	8	1947	December 17, 2023
Does ChatGPT understand indented JSON better than minified? Prompting code-interpreter , json	23	1545	November 3, 2024
Is there a way to force function to return a minified json? API api , functions	6	1505	July 1, 2023
Function execution result: how to produce less tokens? API api	5	1122	July 13, 2023
Reducing token usage while hinting LLM as it generates API gpt-4 , gpt-35-turbo , chatgpt , fine-tuning , api	5	3119	October 25, 2023

Using the API? Here's how you can save up to 30% and increase reliability

Related topics