We are using GPT 3.5 turbo for Q&A…while sending context we are also looking for suggestions for next two prompts for the user basis the first question he asked. It was working fine as we got the answer and saved it in the DB…but once we implemented streaming, GPT is not able to recognise where does the first answer stop and from where the followup questions start…we tried giving instructions to add special character, line break etc…but its not fool proof…has anyone managed this earlier. we tried sending two api…its working but cost is double so wont work. Please suggest if you have managed this earlier or have suggestions in general.
Are you making use of the user role section i.e. sytsem, user, assistant ? Can you post snippets of your API calling code and the code that builds the prompts please.
“Implemented streaming”?..the AI doesn’t perceive that the response tokens are being streamed to the user. Use top_p=0.0001, and observe the exact same answer to the same input.
gpt-3.5-turbo is now very poor at following “make multiple outputs” type system prompts. You can make a post-prompt section in the user role “## required output” and then describe the list of two generations that should be produced there, with part two starting with a “prefix”.
You can also multi-shot the AI, give it a chat history of a successful input and output before your “user” asks the question.