I’m calling an inference API provided by a third party, for the most part it seems like a 1:1 copy of the completions API minus logprobs. (It just sets them all to null)
I have a json_schema response_format that has 3 fields, among them a “message”, but I’ve noticed anytime it’s about to produce a list, or some other content/markdown, especially right after a colon it just truncates the response. I noticed when I don’t specify a response_format it provides the answer normally. BUT the rest of the JSON object is produced normally, in proper structure, it’s just this one “message” field that gets truncated.
I don’t have any stop, temperature is 0.0, I have a large max completion tokens that isn’t ever hit, and the response format works among other things on other inference APIs. Except the model I’m using here is gpt-oss-120b
E.g. response message content, I can safely parse the whole thing, but the “message” field is incomplete and I can’t get anything after the colon ‘:’ out of it.
"content": "{ \n \"message\": \"Here are the main content\u2011related differences that you need to take into account when moving from BlueSpice\u202f4 to BlueSpice\u202f5:\", ...}
EDIT: Forgot to mention the finish reason is indeed ‘stop’
I’ve seen issues with malformed content before, but then the AI usually fails to generate valid JSON, in this case the JSON is valid but a field within the JSON is having the content truncated
The first thing I’d do: see the output token count that is being returned, compare to the text that you get run through the o200k-harmony tokenizer (likely same as o200k_base).
Is there much more to be seen as output than you see?
Then log the wire transfer outside of a calling SDK, see the raw content being returned from the same call.
You have ellipsis - are there more JSON keys of structured output continuing after that (and are the keys to be written ones that should come before a message output that would enhance planning?)
If it is a single anomaly, the AI model itself with a tendency of closing the string when it has written a colon, you could provide in the output schema that producing colon (’ :', “:”) is forbidden in the JSON field.
So with the response_format I’m getting around 170 tokens, without it 489 (valid response). Seems to indicate it is messing up somehow?
I’m already making the API request myself. I put the ellipsis there to replace other valid JSON, they do continue, in theory they can be produced before the message, but the information all coincides with each other.
Doesn’t seem like a single anomaly cause I’ve used many other different prompts, and queries, but same issue appears, at the moment I can’t trigger one that doesnt stop right after a colon.
You can try re-sorting the fields in your JSON schema. See if the AI is more likely to continue past a colon in a later key’s value.
If you aren’t paying for unseen tokens (send the entirety of the output string through a tokenizer) It does sound like something about the model weights or the strict structured context-free-grammar implementation that make it close JSON strings after a colon. If this is a hard “rule” that cannot be overcome, you can report the issue with structured outputs to the provider.
Well the reordering of the response_format keys worked, so I’m even more baffled now, but I’m just going to take it as a win, and contact the provider about this incase they’ve any insights, thank you so much for your help!