Transcript cleanup - O3, O3-mini, O4-mini

I need to filter thousands of pages of transcripts to improve my videos and I cannot use O3 due to its rate limits and comparatively high cost.

However, I have been experimenting with o4-mini, and o3-mini. O3-mini gives 15% longer transcript cleanups and O4-mini gives 10% shorter transcript cleanups than O3. However, o4-mini cuts out too much detail in my opinion. Not sure if I should experiment with o4-mini prompting since it consistently cuts out more detail when I try to force it to format responses like O3. For instance, I prompted "You are given a block of raw source text. Transform it into a concise, well-organized document by analyzing the transcript and applying the following structure to maximize brevity and clarity:

  • Title based on lesson content
  • Use bulleted lists and subheadings to chunk information into bite sized pieces
  • Use numbered steps and consistent “Step 1, 2, 3…” labels made the flow explicit.
  • Generic “Approach A vs. Approach B” labels stand in for repeated descriptive paragraphs.
  • Parallel structure in examples (e.g. “Part → link back → next part”) highlights the pattern.
  • Visual Aids (optional) • Insert simple ASCII diagrams, flowcharts, or arrows if they clarify relationships.

When rephrasing the transcript ensure clarity and brevity by:

  • Eliminating filler words, redundancies, speaker labels, timestamps, false starts, stutters, and disfluencies.
  • Keeping sentences short and purposeful.
  • Ensuring active-voice constructions (“You start with…,” “You ask: Is this relevant?”) focus on the reader’s actions.
  • Ensuring transitional phrases (“By bouncing between…,” “Quick sketch,” “That question forces…”) guide the reader through each shift.
  • Unnecessary qualifiers and asides were pruned to keep only the essential logic."

Which gave me an ultra-concise 400 word output when O3 gave a 855 word output. O3-mini gave a 988 output without formatting rules but it didn’t have explicit attention to detail that I require. I am new to API programming and I finished coding something for batch processing, but the only thing standing in the way is quality control. Btw these tests were done in the playground, and if I posted this under the wrong tag please tell me where to post next time.

1 Like

Hi and welcome to the developer community forum!

Have you tried a model like gpt-4.1? very capable model, cheap and might give you a 3rd option to experiment with.

Welcome to the community @hallucinigenic101!

I second the suggestion made by @Foxalabs to use the GPT-4.1 model. It’s not a reasoning model, but for your use case, it might just fit the bill. If the input content (i.e., your transcripts) are too unstructured, you could also look at cleaning it up for use with this model. Your prompt looks good, and actually follows some of the guidelines laid out here in the official GPT-4.1 prompting guide, so I would say just try swapping the model without any additional changes and compare the results.

As you become more familiar with API calls, you may find that these two parameters are of interest to you for this use case:

  1. Temperature: Lower this value to make outputs more focused and deterministic
  2. Max Output Tokens: If there’s a certain word count you’re targeting, then set this parameter to the equivalent token count. You could use this OpenAI tokenizer to estimate the token count for any summaries that you have already generated.

Hope this is helpful.