I still don’t get byte/character position offset: [gpt-4o-mini, gpt-4o, gpt-4-turbo…] EDIT: Derp, it’s only the text “completions” and a completion model like gpt-3.5-turbo-instruct
that offers a "offset
in the logprob return, (what I get for clicking an offered link without seeing the destination).
API reference documentation of completions response object:
choices[0]
as received by RESTful API call:
chat logprobs:
{
"index": 0,
"message": {
"role": "assistant",
"content": "Testing successful!",
"refusal": null
},
"logprobs": {
"content": [
{
"token": "Testing",
"logprob": -0.1450273,
"bytes": [
84,
101,
115,
116,
105,
110,
103
],
"top_logprobs": [
{
"token": "Testing",
"logprob": -0.1450273,
"bytes": [
84,
101,
115,
116,
105,
110,
103
]
},
{
"token": "Test",
"logprob": -2.7700274,
"bytes": [
84,
101,
115,
116
]
},
{
"token": "It",
"logprob": -3.2700274,
"bytes": [
73,
116
]
}
]
},
{
"token": " successful",
"logprob": -0.04402329,
"bytes": [
32,
115,
117,
99,
99,
101,
115,
115,
102,
117,
108
],
"top_logprobs": [
{
"token": " successful",
"logprob": -0.04402329,
"bytes": [
32,
115,
117,
99,
99,
101,
115,
115,
102,
117,
108
]
},
{
"token": " received",
"logprob": -3.6690233,
"bytes": [
32,
114,
101,
99,
101,
105,
118,
101,
100
]
},
{
"token": " complete",
"logprob": -4.9190235,
"bytes": [
32,
99,
111,
109,
112,
108,
101,
116,
101
]
}
]
},
{
"token": "!",
"logprob": 0.0,
"bytes": [
33
],
"top_logprobs": [
{
"token": "!",
"logprob": 0.0,
"bytes": [
33
]
},
{
"token": ".",
"logprob": -16.875,
"bytes": [
46
]
},
{
"token": "\u2014",
"logprob": -19.75,
"bytes": [
226,
128,
148
]
}
]
}
],
"refusal": null
},
"finish_reason": "length"
}
Position can be done client-side if you can encode bytes to UTF-8 reliably as good or better than OpenAI for an application, which don’t have a direct positional correspondence, because Unicode could be complex multi-byte multi-token sequences, such as setting right-to-left text, half or full-width Eastern characters, and then multibyte assembled glyphs.
And improved by just receiving the dang token number.