Using the API? Here's how you can save up to 30% and increase reliability

Hey there! After building a few tools utilizing OpenAI, I noticed a few things that many folks might not be aware of. Things that could be adding up to 30% unnecessary costs to your OpenAI usage. So, here’s what I’ve learned:

  1. Ensure your JSON is as lean as possible: OpenAI bills per token, and that includes whitespaces and line breaks in your JSON responses. If you eliminate these extras both in sending and receiving data, you might save up to 30%! You simply need to tell OpenAI to “return JSON in a single-line without whitespaces”. Boom!
  2. Set temperature to 0 for structured responses: When expecting a structured response (like JSON), setting the temperature parameter to 0 helps the model strictly stick to your expected JSON structure. This will prevent cases where you expect JSON, but something went wrong, and OpenAI responds with “Sorry, I am not sure I can …”.
  3. Robots don’t need you to be polite: Computers understand simple instructions well. Trimming redundant/filler words from your prompt can not only save money but also speed up execution. Words like “please”, “kindly”, “really”, “very”, and so on, can often be dropped without losing accuracy.

Tip: You can use the OpenAI Tokenizer to count the tokens of your requests/responses.

2 Likes

I have heard, but not verified, that YAML is easier for GPT to generate, AND is more compact than JSON, thus requiring fewer of those expensive output tokens.

1 Like

Quite the opposite. Indentation is a core part of YAML. The larger the payload, the more redundant the spaces. You literally cannot make YAML compact because it will render it invalid. Single-line JSON or CSV are the best for cost efficiency.

Take a sample JSON object (e.g. from the Pokemon API), count the tokens using the Tokenizer. Then compare a YAML version of it VS a minified JSON version of it - JSON will be cheaper in every scenario, and the gap will increase as the object grows.

1 Like

yeah, I wondered about that.

On the other hand, if there isn’t a whole lot of indent, and the yaml is fairly flat, then we avoid the need for all the tokens for brace, colon, quote, comma, etc used in json.

fair enough, you can’t minify yaml.

1 Like

Instead of using indents you can use ‘---------’ for example with YAML.

The text inputted into the tokenizer is:

---------------
-------
--
---------------

And it recognizes every line as different even with unequal characters per line.

Results in:

Tokens: 7
Characters: 42

1 Like

Yes. I had the same argument.

I do wonder though. Does GPT work well with minified JSON relative to normal? :thinking:
Genuinely curious.

ChatCompletion needs logprobs!!!

1 Like