Hi team, I found today the api spec doesn’t support input_text anymore and get error message. I’m wondering why there is no backward compatibility and no even deprecation messages for that. It’s really bad user experience that the spec change happens and my app just doesn’t work because of it
It makes sense the assistant role only support output_text type of content. But does it have this check before? I couldn’t validate it now but I highly doubt input_text used to be supported for assistant role.
Update:
I feel it actually doesn’t make sense to not support input_text if the Input item list supports the assistant role, since the only allowed content type is input_text.
I see. I tried it with the assistant role and confirmed it seems to require type="output_text". I usually let the responses API handle the conversation with previous_response_id, so I don’t remember if that was a recent change.
In this case I agree that maybe the API is perhaps becoming unnecessarily strict.
But that would probably be worth an entirely new thread, with other things like “why not keep things compatible with chat completions” and other syntax nuances.
For now, I think the documentation should at least specify the correct expected parameters.
Meanwhile, Chat Completions not being silly, a ‘message’ shape that is just a message, the only difference being the API blocking you from sending image types on the same dang roles as responses now allows.
{
"role": "assistant",
"content": [
{
"type": "text",
"text": "(Verse)\nWell, I woke up this mornin', heart heavy as a stone,\nShe packed her bags and left me."
}
]
}
“output_text” of course the obvious way to send back gathered AI output from a collector…a collector of ResponseTextDeltaEvent, so obviously different from ResponseRefusalDeltaEvent that you’ll need to send back (yes, find the way to send that back..)
Where - why not just shut off all assistant messages as input and force you to only use a prior ID and force ID verification? Okay, ChatKit it is.
The AI output is a stream of integers that can be decoded client-side. Everything else, busting it into a bunch of typed chunks that have to be replayed (or alternately denied to you) is an insult to intelligence.