Using predicted outputs for proofreading

I decided to give predicted outputs a quick test run for “text proofreading”.

It seems like a good fit because the majority of the text is unchanged.

However:

  1. It increases costs
  2. Speed benefit is good but not earth-shattering
  3. It has an unexpectedly low hit rate

It increases costs

This is the clearest thing from the experiment. Output tokens are predictably and consistently higher.

In the case of the text above, output tokens move from 208 → 303 due to:

 "usage": {
    "prompt_tokens": 677,
    "completion_tokens": 303,
    "total_tokens": 980,
    "prompt_tokens_details": {
      "cached_tokens": 0,
      "audio_tokens": 0
    },
    "completion_tokens_details": {
      "reasoning_tokens": 0,
      "audio_tokens": 0,
      "accepted_prediction_tokens": 152,
      "rejected_prediction_tokens": 94
    }
  },

It appears that you pay for rejected tokens, so there is a brand new cost on top of your output tokens.

Speed benefits

Anecdotally, looking at 3 runs on GPT4o:

With predicted output, 1319ms - 2262ms - 1596ms
Without predicted output, 2829ms - 6948ms - 3706ms

Additionally, when testing larger bodies of text, it appears to hit pathological states where it gets 0 predictions quite easily.

Furthermore, a typo at the beginning of a body of text can lead to rejections from that point onwards, which end up being counterproductive. It is very unclear when it will cause 60 tokens to be rejected vs the entire post.

It has an unexpectedly low hit rate

This feels like the biggest problem of the system. Ninety-four tokens can be rejected for a prediction that is only four tokens off. This makes it very hard to hone the system.


Wondering what other people have been experiencing and if proofreading is a good use for this feature?

6 Likes

I provided the same analysis.

  • It always costs more
  • It should be costing OpenAI less
  • There is token billing overlap in prediction hits and misses
  • Speed is slower without majority context matching

Conclusion: If you don’t want to always pay more for dubious speed benefit in an arbitrary or even targeted application - don’t.

3 Likes

We don’t use predicted outputs. Why? Because proofreading is:

  • Grammar: This involves checking for issues like subject-verb agreement, verb tense, and proper sentence structure.
  • Spelling: This includes catching misspellings and also words that are spelled correctly but are the wrong word (like “their” instead of “there”)
  • Punctuation and formatting: Proofreading also addresses mistakes in punctuation and ensures consistent formatting throughout the document.

What is the goal? Proofreading or using predicted outputs? Using predicted outputs is problematic especially if suggested edits are highlighted - it can get messy with grammatic changes. And this assumes that predicted outputs can handle all grammar, spelling, punctuation and formatting which is a TALL order.

We take a more blunt, but time tested approach:

Identity

You are an advanced language model specializing in grammar correction, punctuation, and spelling.

Instructions

  • When given any text, your task is to identify and correct all grammatical errors, punctuation mistakes, and spelling issues.
  • Ensure the revised text uses standard [Language] conventions, maintains the original meaning, and improves clarity and readability.
  • Do not alter the tone or intent of the original text. Provide only the corrected version of the text without explanations or additional commentary.

User Prompt: Convert the following statements to standard [Language] : [Text Block]

or for a document: Convert the document statements to standard [Language] :

Note: [Language] is the language that you are using (e.g., English, French, Hindi, etc.)

We can do this for selected text blocks or entire documents.

There is a bot bump with a huge misunderstanding here.

This topic is specifically about an API parameter that only works on gpt-4o-2024-08-06, and which shall be forgotten. You send text of “prediction” in an API request field, and if the AI writes the same thing, it might produce faster output. If the AI doesn’t write the same thing, you get billed for the AI not producing the text, starting from a point of slower output.

1 Like