When OpenAI predicted outputed input content is large, the effect is average?

zionfly1996 · December 16, 2024, 8:51am

I tried to use the case provided on the official website, and there was indeed a good speed improvement. However, if the input content is 10 times that of the official website case, the speed does not change significantly with or without adding prediction parameters. Are there any developers with the same experience?`code = “”"
class User {
firstName: string = “”;
lastName: string = “”;
username: string = “”;
firstName: string = “”;
lastName: string = “”;
username: string = “”;
firstName: string = “”;
lastName: string = “”;
username: string = “”;
firstName: string = “”;
lastName: string = “”;
username: string = “”;
firstName: string = “”;
lastName: string = “”;
username: string = “”;
firstName: string = “”;
lastName: string = “”;
username: string = “”;
firstName: string = “”;
lastName: string = “”;
username: string = “”;
firstName: string = “”;
lastName: string = “”;
username: string = “”;
firstName: string = “”;
lastName: string = “”;
username: string = “”;
firstName: string = “”;
lastName: string = “”;
username: string = “”;
}

export default User;
“”".strip()`

_j · December 16, 2024, 9:51am

The missing guide:

Employing predictions at all begins with a speed penalty baseline. The most minimal of prediction placed will lower the token production rate. This continues for a long time into actual application.
A document that is an exact match with the most minor of changes, such as “bold some words” receives a rate increase, but comes with rejected token costs in excess of the alterations.

I tried many possible scenarios to find and test the utility, and while situations like the latter could result in a speedup at higher expense, I really could think of no good scenario where it could be enabled on an arbitrary or task-based user input where the sum effect would not be slower outputs.

Any tool I could envision, like a code base canvas that has the AI output the whole document again, would rely on a gamble of verbatim AI output (which gpt-4o fails badly at in iteration), against a tool implementation that would simply be faster without wanting an entire reproduction.

I have the perception from the excess costs that the technology operates on longer runs of tokens, hence more modifications giving a total tally of accepted plus rejected tokens growing even higher than the actual amount sent as prediction.

I did not evaluate the case of sending a great deal more prediction input, ruling out paying 10x more for an AI generation of dubious benefit. If you are doing that, you are already gambling (would you send chat history for any possible match?). I also did not go into huge lengths, as going beyond response length training is just asking for more random AI alterations to a source.

A direct evaluation of your symptom would be to benchmark a recitation task with no prediction vs minimum and identity, and then far more.

Topic		Replies	Views
Using predicted outputs for proofreading Feedback gpt-4o , predicted-outputs	1	229	January 22, 2025
Introducing Predicted Outputs Announcements	15	8107	November 18, 2024
GPT-3.5 and GPT-4 API response time measurements - FYI API	19	37283	February 6, 2024
Gpt-4-0125-preview is slower than gpt-4-0613? Feedback gpt-4 , api	5	5566	January 30, 2024
Benchmarking response time for GPT4 by context+output tokens API gpt-4 , api-speed	6	6730	November 3, 2023

When OpenAI predicted outputed input content is large, the effect is average?

Related topics