The sending of a prediction seems to have an opposite relationship to the desired operation. Lower output speeds.
Using the prediction parameter gets you slower, more expensive AI instead of no effect when there is no match.
Predictions sent for a task of writing about cute kittens
(nothing cache-able; top_p: 0.0001 not at all making a regular length):
prediction 0: (empty string)
prediction 1: “kitten kittens cute”
prediction 2: the text part of the predicted output documentation
Calls are interleaved.
For 5 trials of prediction 0 @ 2024-11-04 08:44PM:
Stat
Average
Cold
Minimum
Maximum
stream rate
Avg: 123.340
Cold: 103.2
Min: 103.2
Max: 154.4
latency (s)
Avg: 0.537
Cold: 0.5426
Min: 0.37
Max: 0.9877
total response (s)
Avg: 1.788
Cold: 2.1319
Min: 1.5359
Max: 2.1319
total rate
Avg: 86.413
Cold: 77.396
Min: 77.396
Max: 92.525
response tokens
Avg: 153.000
Cold: 165
Min: 137
Max: 169
cost tokens
Avg: 154.800
Cold: 167
Min: 138
Max: 171
prediction tokens
Avg: 0.000
Cold: 0
Min: 0
Max: 0
accepted tokens
Avg: 0.000
Cold: 0
Min: 0
Max: 0
rejected tokens
Avg: 0.800
Cold: 1
Min: 0
Max: 1
For 5 trials of prediction 1 @ 2024-11-04 08:44PM:
Stat
Average
Cold
Minimum
Maximum
stream rate
Avg: 88.180
Cold: 85.5
Min: 54.6
Max: 115.8
latency (s)
Avg: 0.801
Cold: 0.5172
Min: 0.3544
Max: 1.806
total response (s)
Avg: 2.568
Cold: 2.3072
Min: 1.658
Max: 3.4577
total rate
Avg: 62.057
Cold: 66.748
Min: 39.622
Max: 91.677
response tokens
Avg: 147.600
Cold: 154
Min: 137
Max: 154
cost tokens
Avg: 152.400
Cold: 159
Min: 141
Max: 159
prediction tokens
Avg: 4.000
Cold: 4
Min: 4
Max: 4
accepted tokens
Avg: 0.000
Cold: 0
Min: 0
Max: 0
rejected tokens
Avg: 3.800
Cold: 4
Min: 3
Max: 4
For 5 trials of prediction 2 @ 2024-11-04 08:44PM:
Stat
Average
Cold
Minimum
Maximum
stream rate
Avg: 91.560
Cold: 86.8
Min: 68.8
Max: 118.1
latency (s)
Avg: 0.509
Cold: 0.6555
Min: 0.3166
Max: 0.6555
total response (s)
Avg: 2.156
Cold: 2.2915
Min: 1.7055
Max: 2.3228
total rate
Avg: 69.247
Cold: 62.405
Min: 59.813
Max: 85.019
response tokens
Avg: 147.400
Cold: 143
Min: 138
Max: 160
cost tokens
Avg: 211.600
Cold: 216
Min: 180
Max: 233
prediction tokens
Avg: 353.000
Cold: 353
Min: 353
Max: 353
accepted tokens
Avg: 0.000
Cold: 0
Min: 0
Max: 0
rejected tokens
Avg: 63.200
Cold: 72
Min: 28
Max: 72
Token Usage Log: 5 trials of 0:
Measured
Completion
Prediction
Accepted
Rejected
165
167
0
0
1
142
144
0
0
1
169
171
0
0
1
152
154
0
0
1
137
138
0
0
0
Token Usage Log: 5 trials of 1:
Measured
Completion
Prediction
Accepted
Rejected
154
159
4
0
4
148
153
4
0
4
137
141
4
0
3
147
152
4
0
4
152
157
4
0
4
Token Usage Log: 5 trials of 2:
Measured
Completion
Prediction
Accepted
Rejected
143
216
353
0
72
138
211
353
0
72
160
233
353
0
72
145
218
353
0
72
151
180
353
0
28
The accounting is all goofy. “measured” and “prediction” are the response and the sent prediction measured by tiktoken, the rest are returned in the usage chunk. It can’t even be certain if I get billed for an empty string…