I am working with audio files from phone conversations and using Google Speech-to-Text to convert them into text. Each conversation includes the order of the sentence, the speaker, the time of speaking, and the content of the conversation. For example:
1, Speaker1, 0, Uh, yeah, so, I was thinking, um, maybe we could, like, uh, you know, start the project, uh, next week?
2, Speaker2, 6, Um, yeah, I guess, but, uh, do we have, like, all the resources we need, or, um, are we, like, still missing some stuff?
I want to use OpenAI to edit this content by removing filler words like “uh,” “um,” “like,” improving grammar, and correcting spelling mistakes. The result I am expecting is something like this:
1, Speaker1, 0, I was thinking maybe we could start the project next week?
2, Speaker2, 6, I guess, but do we have all the resources we need, or are we still missing some stuff?
Due to the output length limit (max tokens), I provide the entire conversation to OpenAI but only ask for a specific part, such as “Please return lines 1 to 50.” Initially, OpenAI worked well, but after some time, it sometimes only returns lines 1 to 20, which causes errors in my program.
When OpenAI returns the wrong line numbers, I have tried resending the request, such as: “You are returning the wrong results. Please return from line 1 to line 50.” However, sometimes resending the request is not effective.
The OpenAI API I have used: chat completion: https://platform.openai.com/docs/api-reference/chat/create
I used gpt-4o model
How can I ensure that OpenAI always returns the exact number of lines requested? Or is there another approach to ensure the text is edited as desired?