Introducing Predicted Outputs

_j · November 5, 2024, 5:02am

The sending of a prediction seems to have an opposite relationship to the desired operation. Lower output speeds.

Using the prediction parameter gets you slower, more expensive AI instead of no effect when there is no match.

Predictions sent for a task of writing about cute kittens
(nothing cache-able; top_p: 0.0001 not at all making a regular length):

prediction 0: (empty string)
prediction 1: “kitten kittens cute”
prediction 2: the text part of the predicted output documentation

Calls are interleaved.

For 5 trials of prediction 0 @ 2024-11-04 08:44PM:

Stat	Average	Cold	Minimum	Maximum
stream rate	Avg: 123.340	Cold: 103.2	Min: 103.2	Max: 154.4
latency (s)	Avg: 0.537	Cold: 0.5426	Min: 0.37	Max: 0.9877
total response (s)	Avg: 1.788	Cold: 2.1319	Min: 1.5359	Max: 2.1319
total rate	Avg: 86.413	Cold: 77.396	Min: 77.396	Max: 92.525
response tokens	Avg: 153.000	Cold: 165	Min: 137	Max: 169
cost tokens	Avg: 154.800	Cold: 167	Min: 138	Max: 171
prediction tokens	Avg: 0.000	Cold: 0	Min: 0	Max: 0
accepted tokens	Avg: 0.000	Cold: 0	Min: 0	Max: 0
rejected tokens	Avg: 0.800	Cold: 1	Min: 0	Max: 1

For 5 trials of prediction 1 @ 2024-11-04 08:44PM:

Stat	Average	Cold	Minimum	Maximum
stream rate	Avg: 88.180	Cold: 85.5	Min: 54.6	Max: 115.8
latency (s)	Avg: 0.801	Cold: 0.5172	Min: 0.3544	Max: 1.806
total response (s)	Avg: 2.568	Cold: 2.3072	Min: 1.658	Max: 3.4577
total rate	Avg: 62.057	Cold: 66.748	Min: 39.622	Max: 91.677
response tokens	Avg: 147.600	Cold: 154	Min: 137	Max: 154
cost tokens	Avg: 152.400	Cold: 159	Min: 141	Max: 159
prediction tokens	Avg: 4.000	Cold: 4	Min: 4	Max: 4
accepted tokens	Avg: 0.000	Cold: 0	Min: 0	Max: 0
rejected tokens	Avg: 3.800	Cold: 4	Min: 3	Max: 4

For 5 trials of prediction 2 @ 2024-11-04 08:44PM:

Stat	Average	Cold	Minimum	Maximum
stream rate	Avg: 91.560	Cold: 86.8	Min: 68.8	Max: 118.1
latency (s)	Avg: 0.509	Cold: 0.6555	Min: 0.3166	Max: 0.6555
total response (s)	Avg: 2.156	Cold: 2.2915	Min: 1.7055	Max: 2.3228
total rate	Avg: 69.247	Cold: 62.405	Min: 59.813	Max: 85.019
response tokens	Avg: 147.400	Cold: 143	Min: 138	Max: 160
cost tokens	Avg: 211.600	Cold: 216	Min: 180	Max: 233
prediction tokens	Avg: 353.000	Cold: 353	Min: 353	Max: 353
accepted tokens	Avg: 0.000	Cold: 0	Min: 0	Max: 0
rejected tokens	Avg: 63.200	Cold: 72	Min: 28	Max: 72

Token Usage Log: 5 trials of 0:

Measured	Completion	Rejected
165	167	1
142	144	1
169	171	1
152	154	1
137	138	0

Token Usage Log: 5 trials of 1:

Measured	Completion	Prediction	Rejected
154	159	4	4
148	153	4	4
137	141	4	3
147	152	4	4
152	157	4	4

Token Usage Log: 5 trials of 2:

Measured	Completion	Prediction	Rejected
143	216	353	72
138	211	353	72
160	233	353	72
145	218	353	72
151	180	353	28

The accounting is all goofy. “measured” and “prediction” are the response and the sent prediction measured by tiktoken, the rest are returned in the usage chunk. It can’t even be certain if I get billed for an empty string…

Topic		Replies	Views
When OpenAI predicted outputed input content is large, the effect is average? API gpt-4	1	118	December 16, 2024
Using predicted outputs for proofreading Feedback gpt-4o , predicted-outputs	1	216	January 22, 2025
Hypothetical Token-increase Strategy . Community gpt-4 , chatgpt	21	254	March 17, 2025
Feature Request: Token Adaptive Model API chatgpt , api	25	2077	August 8, 2023
Do 'MAX tokens' include the follow up prompts and completion in a single chat session API token	22	5276	August 25, 2023

Measured	Completion	Prediction	Rejected
143	216	353	72
138	211	353	72
160	233	353	72
145	218	353	72
151	180	353	28

Measured	Completion	Prediction	Rejected
143	216	353	72
138	211	353	72
160	233	353	72
145	218	353	72
151	180	353	28