First, you can press the ‘view code’ button to ensure you have the correct parameters. Then despite those, crank the max_tokens value you can receive as output way up. You can see my truncation at 128 tokens.
Then look at the finish_reason in the response. If it is “stop”, the AI just decided to stop writing.
Another oddity about this model is that specifying temperature and top_p values too low is not just a setting to zero, it a setting to “useless”. The parsing of negative exponents is screwed up, perhaps, or math is done with reduced bits that overflow. So a setting like 0.0001 that still allows variance is needed to have moderately replicable outputs.