Current OpenAI models have a pretty low limit when it comes to outputting text.
GPT-4 models can take up to 128k tokens as a context window (so the total input + output), but can only output up to 4096 tokens at a time. If you want to output more, you have to send another request (which doubles the amount of tokens you’re using since it counts as another generation).
One token roughly equals ~4 characters, sometimes less with English text, so that fits your description - it just sounds like you’re giving the Assistant more text as input than it can output in one go.
One possible workaround to do this automatically would be to give the Assistant a custom function it can call at the end of its first generation if it realises it hasn’t finished fixing/organising the transcript yet. Your code would then return a function output to the Assistant, instructing it to keep going and pick up where it left off.