I figured out you are using max_tokens 4000. Try reducing it to get the
system + previous_conversation + response < 4000
I will try that too, however, I also tried setting max_token: null, that is, using default value (which I am not sure what it is) and the result was the same Bad Request.
If you set max_token:null
this will overwrite the client max_token
value the client sends to the API, but the OpenAI hard limit of 4096 tokens cannot be bypassed or overridden by any client method.
HTH
See also related:
Came across this thread and figured I’d chime in, even though the thread is dated. I too am getting the “As a large language model…” style responses more often than I care and I definitely consider this a bug, especially for an API. In my case, my system content is, consistently, no more than two relatively short sentences—so, as least for me, this has nothing to do with large system content.
Any new findings from anyone else here?