Completion cut off and unknown characters

Hi everyone!

I am relatively new to GPT-3, so my apologies if my questions seem quite trivial. I have tried looking into existing questions, but even though some questions are quite similar but none of them directly answered my queries.

I have 2 questions:

  1. How can I prevent the system from generating incomplete completions (cut off ) independent of the max token numbers. If in my app I am using GPT-3 with Java. Will it be a better and possible option to post process the generated completion to truncate it and remove extra words after a “.” before displaying the generated completion to the user?

  2. My fine tuned model occasionally generates text containing unknown characters to me (maybe similar to Chinese language- I don’t speak Chinese) even though my dataset was only provided in English language. Does anyone know what is the reason for this and how can I solve it?

Thanks in advance!

Hi, regarding your first question, I have only experienced the problem of “incomplete completions” when bumping up against the maximum tokens specified. So make sure you are being generous enough with your maximum. To ensure that I got a “complete completion” within my specified maximum tokens, I found it helpful to include explicit instructions to that effect, something like this: “summarize details to provide an answer in 2-3 succinct sentences.” Sorry I have no idea about your second question.

1 Like

My fine-tuned model is behaving in a way that it keeps generating text until the maximum token specified. However, aligned with the purpose I want it for, one generated sentence should be enough. Therefore, I use a “.” as stop sequence. It is working fine for now. Thank you for your help!