Responses still very very long , I have tried everything from limiting tokens, which every time results in a hard cut off mid sentence, to playing with the prompt using “ be concise in your response” or “ try to respond within 3 sentences or more” and other prompts but all of these results in poor response with hard cutoff, do we know what I can do to limit the AI response WITHOUT the hard cut off?
My second question is, how do I protect my systems from bad actors? Let’s say, someone that wants to loop the ai to keep running long responses and so make us pay more for the api, or simply security stuff to protect my infrastructure and responses/tokens from bad actors?
What some apps do is simply calculate what the cost of any user request is going to be, and then deduct it from the user’s credit. I allow users to purchase more API “credit” any time they want too, so there’s no way anyone can cost me any money because they have to pay into their account before consuming any API usage at all.
I’ve never had any problems with responses getting cut off, but most of the time I’m asking questions that only take 3 to 6 paragraphs as the response, and I’m setting max_tokens to 2000ish I think.
Have you tried using the system prompt to say something like “Make your responses be 3 paragraphs or less”, for example? I noticed you said “3 sentences or more”, and the “or more” part may be confusing it.