How can I split a transcription into multiple paragraphs?

marius.franzen1 · March 6, 2023, 1:08pm

Just for fun I decided to create an application assisted by Open-AI that takes a video and turns it into a readable article. I have used the audio/transcriptions endpoint to transcribe the video, and the chat/completions endpoint to generate a header for the article. My issue is that almost the entire transcription is a solid chunk of text. It added linebreaks in the beginning, but then it’s just a solid chunk. You can see the transcription here: Open AI transcription - Pastebin.com

How can I use open AI to fix this? Can I tweak the audio/transcriptions call somehow? Can I use some other endpoint to insert linebreaks?

ckmacleod · May 21, 2023, 5:02am

Ran into the same problem using “completions”: The text to “translate” (or in my use case correct for grammar and spelling) is always returned as a single block of text, regardless of paragraph breaks.

My prompts, using text-davinci-003 are “Correct this to standard English” or “Correct this to informal English.” I’ve added, “and preserve paragraphs” and other variations to no effect.

Attempts to place flags - particles, bracketed or escaped code, whole sentences - for subsequent replacement are defeated. The text returned removes those, too.

The only solution I’ve found so far has been to “explode” the original text into an array based on the paragraph break, produce a request for each paragraph, then re-assemble the text. The problem at this point is that the translations/corrections are of lower quality, I presume because they don’t benefit from context. So, I’ve seen clear mistakes in the original be found in the “whole block” submission, but be missed in the series of submissions.

If any one has a solution for conveying paragraph breaks from original through returned text, I’d be interested in knowing it.

firtina · May 21, 2023, 10:16am

You can get the verbose_json response format, the response will be typed according to this:

AxiosResponse<VerboseJSONTranscriptionResponse>

Then you can use the segment information to arrange the response.


export type TranscriptionSegment = {
    "id": number,
    "seek": number,
    "start": number,
    "end": number,
    "text": string,
    "tokens": number[],
    "temperature": number,
    "avg_logprob": number,
    "compression_ratio": number,
    "no_speech_prob": number,
    "transient": boolean,
};
export type VerboseJSONTranscriptionResponse = {
    "task": "transcribe",
    "language": string,
    "duration": number,
    "segments": TranscriptionSegment[],
    "text": string,
}

If that’s still not good enough, there are third-party services such as Deepgram which offer more advanced processing and formatting of the responses, built on top of Whisper or other transcription models.

ckmacleod · May 21, 2023, 7:12pm

Thanks for your reply, but this probably got confused because the original question referred to transcription. while I’ve been trying “completions” and “edits” endpoints for other purposes.

Also, the OpenAI “Playground” uses a “Completions”-based example as an example for correction for grammar.

Though verbose_json is not an option for either completions or edits, I’ve found that by switching to the edits endpoint, I can preserve paragraphing. In fact, the model seems to add new line breaks where none are present in the original. I’m still experimenting with different instructions. I’m also getting a higher frequency of timeouts, but maybe it’s time of day or some other network issue.

Depending on how things go, I might start my own thread on this question.

firtina · May 21, 2023, 7:33pm

Yes, I was talking about transcription, to be clear.

Topic		Replies	Views
Whisper API server error for long (not big) files API whisper	7	3623	December 18, 2023
Trying to use gpt-4 for correcting, but it summarises API gpt-4 , whisper	9	2408	December 30, 2023
Whisper API - Requests - Error - Transcribing a Podcast from RSS Feed API api , whisper , completions	4	1693	September 21, 2023
Need help creating a copy editor for novels and other long texts API	4	1343	December 1, 2023
How to split Transcribe() lines into shorter segments? API	2	2085	December 16, 2023

How can I split a transcription into multiple paragraphs?

Related topics