Using the API? Here's how you can save up to 30% and increase reliability

Hey there! After building a few tools utilizing OpenAI, I noticed a few things that many folks might not be aware of. Things that could be adding up to 30% unnecessary costs to your OpenAI usage. So, here’s what I’ve learned:

  1. Ensure your JSON is as lean as possible: OpenAI bills per token, and that includes whitespaces and line breaks in your JSON responses. If you eliminate these extras both in sending and receiving data, you might save up to 30%! You simply need to tell OpenAI to “return JSON in a single-line without whitespaces”. Boom!
  2. Set temperature to 0 for structured responses: When expecting a structured response (like JSON), setting the temperature parameter to 0 helps the model strictly stick to your expected JSON structure. This will prevent cases where you expect JSON, but something went wrong, and OpenAI responds with “Sorry, I am not sure I can …”.
  3. Robots don’t need you to be polite: Computers understand simple instructions well. Trimming redundant/filler words from your prompt can not only save money but also speed up execution. Words like “please”, “kindly”, “really”, “very”, and so on, can often be dropped without losing accuracy.

Tip: You can use the OpenAI Tokenizer to count the tokens of your requests/responses.

3 Likes

I have heard, but not verified, that YAML is easier for GPT to generate, AND is more compact than JSON, thus requiring fewer of those expensive output tokens.

1 Like

Quite the opposite. Indentation is a core part of YAML. The larger the payload, the more redundant the spaces. You literally cannot make YAML compact because it will render it invalid. Single-line JSON or CSV are the best for cost efficiency.

Take a sample JSON object (e.g. from the Pokemon API), count the tokens using the Tokenizer. Then compare a YAML version of it VS a minified JSON version of it - JSON will be cheaper in every scenario, and the gap will increase as the object grows.

1 Like

yeah, I wondered about that.

On the other hand, if there isn’t a whole lot of indent, and the yaml is fairly flat, then we avoid the need for all the tokens for brace, colon, quote, comma, etc used in json.

fair enough, you can’t minify yaml.

1 Like

Instead of using indents you can use ‘---------’ for example with YAML.

The text inputted into the tokenizer is:

---------------
-------
--
---------------

And it recognizes every line as different even with unequal characters per line.

Results in:

Tokens: 7
Characters: 42

1 Like
  1. Ensure your JSON is as lean as possible: OpenAI bills per token, and that includes whitespaces and line breaks in your JSON responses. If you eliminate these extras both in sending and receiving data, you might save up to 30%! You simply need to tell OpenAI to “return JSON in a single-line without whitespaces”. Boom!

Is JSON you mentioned OpenAI API response JSON?

If not, what kind of JSON ?

Just as a heads up, the CL100K_BASE token set used in the chat models has whitespace and formatting tokens for just about every common variation going, so most of it takes a single token.

1 Like

Also if you’re dealing with data that is often repeated, a database of hashed inputs - outputs may save you a lot…

The points in the OP are just plain misinformed.

  1. Remember, BPE encoding is highly compressed. There are tokens that include newlines at the end of many common texts, including json formatting (and the tokens contain the inner dictionary or list closure also. An indent of 2 characters or 40 characters? 1 token. However, “simply tell” the AI what to produce uses tokens, along with trials to get to what you expect.

You are only billed for AI produced and seen language that is passed to the context. That does not include the JSON of sending or receiving to the API, nor even function call structure.

  1. Temperature doesn’t make the AI produce OpenAI’s fine-tune denials.

  2. all words have an effect. This is a very speculative answer. It is chopping your winding path of a prompt in half, though, that will simply serve to not confuse the AIs language processing mechanisms.

example of combined chararacters

About the only technique of benefit is to specify single-space indent, which ensures producing a token that starts with a space - the more common version found in sentences, and thus free extra semantic comprehension.

I also feel like asking for JSON has a greater “tax” on the complexity you can achieve in your output. YAML appears to strike a better balance in terms of structured and semantic quality. Of course, that sort of thing takes a lot of testing.

I understand your reasoning, but you can ask for JSON to be return in one-line, which you cannot do with YAML. This makes JSON much cheaper than YAML.

I think you missed my point. I was referring to JSON returned in the response, as in structured data. This was posted before Function Calling was a thing.