Using predicted outputs for proofreading

I decided to give predicted outputs a quick test run for “text proofreading”.

It seems like a good fit because the majority of the text is unchanged.

However:

  1. It increases costs
  2. Speed benefit is good but not earth-shattering
  3. It has an unexpectedly low hit rate

It increases costs

This is the clearest thing from the experiment. Output tokens are predictably and consistently higher.

In the case of the text above, output tokens move from 208 → 303 due to:

 "usage": {
    "prompt_tokens": 677,
    "completion_tokens": 303,
    "total_tokens": 980,
    "prompt_tokens_details": {
      "cached_tokens": 0,
      "audio_tokens": 0
    },
    "completion_tokens_details": {
      "reasoning_tokens": 0,
      "audio_tokens": 0,
      "accepted_prediction_tokens": 152,
      "rejected_prediction_tokens": 94
    }
  },

It appears that you pay for rejected tokens, so there is a brand new cost on top of your output tokens.

Speed benefits

Anecdotally, looking at 3 runs on GPT4o:

With predicted output, 1319ms - 2262ms - 1596ms
Without predicted output, 2829ms - 6948ms - 3706ms

Additionally, when testing larger bodies of text, it appears to hit pathological states where it gets 0 predictions quite easily.

Furthermore, a typo at the beginning of a body of text can lead to rejections from that point onwards, which end up being counterproductive. It is very unclear when it will cause 60 tokens to be rejected vs the entire post.

It has an unexpectedly low hit rate

This feels like the biggest problem of the system. Ninety-four tokens can be rejected for a prediction that is only four tokens off. This makes it very hard to hone the system.


Wondering what other people have been experiencing and if proofreading is a good use for this feature?

5 Likes

I provided the same analysis.

  • It always costs more
  • It should be costing OpenAI less
  • There is token billing overlap in prediction hits and misses
  • Speed is slower without majority context matching

Conclusion: If you don’t want to always pay more for dubious speed benefit in an arbitrary or even targeted application - don’t.

3 Likes