Using predicted outputs for proofreading

sam.saffron · January 22, 2025, 1:32am

I decided to give predicted outputs a quick test run for “text proofreading”.

It seems like a good fit because the majority of the text is unchanged.

However:

It increases costs
Speed benefit is good but not earth-shattering
It has an unexpectedly low hit rate

It increases costs

This is the clearest thing from the experiment. Output tokens are predictably and consistently higher.

In the case of the text above, output tokens move from 208 → 303 due to:

 "usage": {
    "prompt_tokens": 677,
    "completion_tokens": 303,
    "total_tokens": 980,
    "prompt_tokens_details": {
      "cached_tokens": 0,
      "audio_tokens": 0
    },
    "completion_tokens_details": {
      "reasoning_tokens": 0,
      "audio_tokens": 0,
      "accepted_prediction_tokens": 152,
      "rejected_prediction_tokens": 94
    }
  },

It appears that you pay for rejected tokens, so there is a brand new cost on top of your output tokens.

Speed benefits

Anecdotally, looking at 3 runs on GPT4o:

With predicted output, 1319ms - 2262ms - 1596ms
Without predicted output, 2829ms - 6948ms - 3706ms

Additionally, when testing larger bodies of text, it appears to hit pathological states where it gets 0 predictions quite easily.

Furthermore, a typo at the beginning of a body of text can lead to rejections from that point onwards, which end up being counterproductive. It is very unclear when it will cause 60 tokens to be rejected vs the entire post.

It has an unexpectedly low hit rate

This feels like the biggest problem of the system. Ninety-four tokens can be rejected for a prediction that is only four tokens off. This makes it very hard to hone the system.

Wondering what other people have been experiencing and if proofreading is a good use for this feature?

_j · January 22, 2025, 1:51am

I provided the same analysis.

It always costs more
It should be costing OpenAI less
There is token billing overlap in prediction hits and misses
Speed is slower without majority context matching

Conclusion: If you don’t want to always pay more for dubious speed benefit in an arbitrary or even targeted application - don’t.

Topic		Replies	Views
When OpenAI predicted outputed input content is large, the effect is average? API gpt-4	1	163	December 16, 2024
Introducing Predicted Outputs Announcements	15	8840	November 18, 2024
Need help on how to approach the API usage metric for user of the app API	16	1790	January 3, 2024
Limits and limits and limits API	2	1452	May 31, 2021
GPT-4 AI is Great, But at a Hefty Price Tag ;) API	8	4084	August 4, 2023

Using predicted outputs for proofreading

It increases costs

Speed benefits

It has an unexpectedly low hit rate

Related topics