We are using Whisper API to transcribe audio files, which can be quite long. The whisper transcription process works fine, and then we run the transcript through assistant API for post-processing to fix some specific issues with the transcript, as well as organize it better.
The problem I’m having is that the Assistant API’s message back to me is being cutoff. For example, the raw transcript might be 50,000 characters, and usually assistant stops the formatted response around 20,000 characters. I have tried both the streaming response and the non-streaming response, in which they both behave similarly by stopping around 20,000 characters.
The response that Assistant is generating is great up until it stops, is there a way that I can receive the full response?
Current OpenAI models have a pretty low limit when it comes to outputting text.
GPT-4 models can take up to 128k tokens as a context window (so the total input + output), but can only output up to 4096 tokens at a time. If you want to output more, you have to send another request (which doubles the amount of tokens you’re using since it counts as another generation).
One token roughly equals ~4 characters, sometimes less with English text, so that fits your description - it just sounds like you’re giving the Assistant more text as input than it can output in one go.
One possible workaround to do this automatically would be to give the Assistant a custom function it can call at the end of its first generation if it realises it hasn’t finished fixing/organising the transcript yet. Your code would then return a function output to the Assistant, instructing it to keep going and pick up where it left off.
1 Like
I am not a huge fan (so far) of Gemini but that is something Gemini 1.5 handles really well. So consider use Gemini for the whisper post processing?