Introducing Predicted Outputs

The sending of a prediction seems to have an opposite relationship to the desired operation. Lower output speeds.

  • Using the prediction parameter gets you slower, more expensive AI instead of no effect when there is no match.

Predictions sent for a task of writing about cute kittens
(nothing cache-able; top_p: 0.0001 not at all making a regular length):

  • prediction 0: (empty string)
  • prediction 1: “kitten kittens cute”
  • prediction 2: the text part of the predicted output documentation

Calls are interleaved.

For 5 trials of prediction 0 @ 2024-11-04 08:44PM:

Stat Average Cold Minimum Maximum
stream rate Avg: 123.340 Cold: 103.2 Min: 103.2 Max: 154.4
latency (s) Avg: 0.537 Cold: 0.5426 Min: 0.37 Max: 0.9877
total response (s) Avg: 1.788 Cold: 2.1319 Min: 1.5359 Max: 2.1319
total rate Avg: 86.413 Cold: 77.396 Min: 77.396 Max: 92.525
response tokens Avg: 153.000 Cold: 165 Min: 137 Max: 169
cost tokens Avg: 154.800 Cold: 167 Min: 138 Max: 171
prediction tokens Avg: 0.000 Cold: 0 Min: 0 Max: 0
accepted tokens Avg: 0.000 Cold: 0 Min: 0 Max: 0
rejected tokens Avg: 0.800 Cold: 1 Min: 0 Max: 1

For 5 trials of prediction 1 @ 2024-11-04 08:44PM:

Stat Average Cold Minimum Maximum
stream rate Avg: 88.180 Cold: 85.5 Min: 54.6 Max: 115.8
latency (s) Avg: 0.801 Cold: 0.5172 Min: 0.3544 Max: 1.806
total response (s) Avg: 2.568 Cold: 2.3072 Min: 1.658 Max: 3.4577
total rate Avg: 62.057 Cold: 66.748 Min: 39.622 Max: 91.677
response tokens Avg: 147.600 Cold: 154 Min: 137 Max: 154
cost tokens Avg: 152.400 Cold: 159 Min: 141 Max: 159
prediction tokens Avg: 4.000 Cold: 4 Min: 4 Max: 4
accepted tokens Avg: 0.000 Cold: 0 Min: 0 Max: 0
rejected tokens Avg: 3.800 Cold: 4 Min: 3 Max: 4

For 5 trials of prediction 2 @ 2024-11-04 08:44PM:

Stat Average Cold Minimum Maximum
stream rate Avg: 91.560 Cold: 86.8 Min: 68.8 Max: 118.1
latency (s) Avg: 0.509 Cold: 0.6555 Min: 0.3166 Max: 0.6555
total response (s) Avg: 2.156 Cold: 2.2915 Min: 1.7055 Max: 2.3228
total rate Avg: 69.247 Cold: 62.405 Min: 59.813 Max: 85.019
response tokens Avg: 147.400 Cold: 143 Min: 138 Max: 160
cost tokens Avg: 211.600 Cold: 216 Min: 180 Max: 233
prediction tokens Avg: 353.000 Cold: 353 Min: 353 Max: 353
accepted tokens Avg: 0.000 Cold: 0 Min: 0 Max: 0
rejected tokens Avg: 63.200 Cold: 72 Min: 28 Max: 72

Token Usage Log: 5 trials of 0:

Measured Completion Prediction Accepted Rejected
165 167 0 0 1
142 144 0 0 1
169 171 0 0 1
152 154 0 0 1
137 138 0 0 0

Token Usage Log: 5 trials of 1:

Measured Completion Prediction Accepted Rejected
154 159 4 0 4
148 153 4 0 4
137 141 4 0 3
147 152 4 0 4
152 157 4 0 4

Token Usage Log: 5 trials of 2:

Measured Completion Prediction Accepted Rejected
143 216 353 0 72
138 211 353 0 72
160 233 353 0 72
145 218 353 0 72
151 180 353 0 28

The accounting is all goofy. “measured” and “prediction” are the response and the sent prediction measured by tiktoken, the rest are returned in the usage chunk. It can’t even be certain if I get billed for an empty string…

1 Like