Prompt Fatigue Question For API Calls

hugebelts · January 14, 2025, 10:56pm

I say: It may be. If it turns out the same still… after this… Then another issue may be causing the problem.

P.S.: Above this are suggestions. And a description.

_j · January 14, 2025, 11:15pm

Problems Identified

Non-Semantic Splitting: Splitting purely by word count disregards paragraph or sentence boundaries. Chunks may end abruptly, leaving sentences or paragraphs incomplete.
Token Limit Estimation: There is a terrible algorithm that tries to do math based on maximum response tokens, with no actual token measurement mechanism (which would require iteration also to achieve a desired length).
Excessive Chunk Size: Setting the chunk size to MAX_TOKENS // 3 does not optimize for the ideal input length, as larger chunks can lead to degraded AI understanding.

Improved Chunking Strategy

A more effective strategy involves semantic-aware splitting:

Primary Split Criterion: Split by paragraphs if the transcript is formatted as such.
Fallback Criterion: Split by sentences if paragraphs are too long or not clearly defined.
Ideal Chunk Size: Aim for chunks between 500–1000 words (around 1500 tokens) for optimal AI understanding. We actually use words without confusion.

Improved Implementation

Below is also some bot-written code after a few iterations.

You can put back the framing messages after making them actionable and not word spew.

Python code implementing semantic-aware chunking using word counts, with a complete workflow for first preparing the chunks, processing them via the API, logging the inputs and outputs for diagnostics, and reassembling the final document.

The Code

import re

# Temporary diagnostic files
CHUNK_INPUT_FILE = "chunk_input.txt"
LOG_FILE = "chunk_log.txt"

# Function to split text by paragraphs
def split_by_paragraphs(text):
    """Split text into paragraphs using double newlines as delimiters."""
    paragraphs = text.split("\n\n")
    return [para.strip() for para in paragraphs if para.strip()]

# Function to split text into sentences
def split_by_sentences(text):
    """Split text into sentences using punctuation delimiters."""
    return re.split(r'(?<=[.!?])\s', text)

# Function for semantic-aware splitting
def semantic_chunk_split(text, target_words=500, max_words=800, hard_limit=1000):
    """
    Split text semantically with priorities:
    - Paragraph splits at ~500 words.
    - Sentence-level splits up to ~800 words if no paragraph boundary found.
    - Hard split at ~1000 words if necessary.
    """
    paragraphs = split_by_paragraphs(text)
    chunks = []
    current_chunk = []
    current_word_count = 0

    for para in paragraphs:
        para_word_count = len(para.split())

        # Check if adding this paragraph would exceed the max word count
        if current_word_count + para_word_count > max_words:
            # If target length not reached, split by sentences
            if current_word_count < target_words and para_word_count <= max_words:
                sentences = split_by_sentences(para)
                for sentence in sentences:
                    sentence_word_count = len(sentence.split())
                    if current_word_count + sentence_word_count > max_words:
                        # Force split when hard limit is reached
                        chunks.append({
                            "content": " ".join(current_chunk).strip(),
                            "split_at": "sentence"
                        })
                        current_chunk = [sentence]
                        current_word_count = sentence_word_count
                    else:
                        current_chunk.append(sentence)
                        current_word_count += sentence_word_count
            else:
                # Save the current chunk and start a new one
                chunks.append({
                    "content": " ".join(current_chunk).strip(),
                    "split_at": "paragraph"
                })
                current_chunk = [para]
                current_word_count = para_word_count
        else:
            # Add paragraph to the current chunk
            current_chunk.append(para)
            current_word_count += para_word_count

        # Force split if hard limit is reached
        if current_word_count > hard_limit:
            chunks.append({
                "content": " ".join(current_chunk).strip(),
                "split_at": "word"
            })
            current_chunk = []
            current_word_count = 0

    # Add the last chunk
    if current_chunk:
        chunks.append({
            "content": " ".join(current_chunk).strip(),
            "split_at": "paragraph" if current_word_count <= target_words else "word"
        })

    return chunks

# Function to process each chunk using the OpenAI API
def process_chunk(chunk, client):
    """Send a chunk of text to the OpenAI API and return the response."""
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": "You are a document assistant."},
            {"role": "user", "content": f"Here is a section of the transcript: {chunk}"}
        ],
        max_tokens=4000,
        temperature=0,
        top_p=0,
        frequency_penalty=0,
        presence_penalty=0
    )
    return response.choices[0].message.content.strip()

# Main function to edit a large transcript
def edit_large_transcript(input_file, output_file, client):
    # Read the large text file
    with open(input_file, "r", encoding="utf-8") as f:
        text = f.read()

    # Split the text into chunks
    chunks = semantic_chunk_split(text)

    # Write the chunks to the diagnostic file
    with open(CHUNK_INPUT_FILE, "w", encoding="utf-8") as f:
        for i, chunk in enumerate(chunks):
            f.write(f"Chunk {i + 1} (split at {chunk['split_at']}):\n")
            f.write(repr(chunk["content"]) + "\n\n")

    # Process each chunk and store the results
    edited_chunks = []
    with open(LOG_FILE, "w", encoding="utf-8") as log:
        for i, chunk in enumerate(chunks):
            log.write(f"Processing Chunk {i + 1} (split at {chunk['split_at']}):\n")
            log.write(repr(chunk["content"]) + "\n\n")
            response = process_chunk(chunk["content"], client)
            log.write(f"Response for Chunk {i + 1}:\n")
            log.write(response + "\n\n=====SPLIT=====\n\n")
            edited_chunks.append({
                "content": response,
                "split_at": chunk["split_at"]
            })

    # Reassemble the final document
    final_text = ""
    for chunk in edited_chunks:
        final_text += chunk["content"]
        if chunk["split_at"] == "paragraph":
            final_text += "\n\n"
        elif chunk["split_at"] == "sentence":
            final_text += " "
        else:
            final_text += " "

    # Write the final output to the file
    with open(output_file, "w", encoding="utf-8") as f:
        f.write(final_text.strip())

    print(f"Editing completed. Results saved to {output_file}")

# Example usage
if __name__ == "__main__":
    from openai import OpenAI  # Importing OpenAI client
    client = OpenAI(api_key="MY_API_KEY")  # Replace with your API key

    input_file = "input_transcript.txt"  # Replace with your input file path
    output_file = "output_transcript.txt"  # Replace with your output file path
    edit_large_transcript(input_file, output_file, client)

Analysis of the Code

Key Features

Semantic Chunk Splitting:
- Prioritizes splitting by paragraphs and sentences, falling back to word-level only when absolutely necessary.
- Ensures that chunks adhere to a target size of 500 words, with a maximum of 800 words or a hard limit of 1000 words.
- Each chunk contains metadata (split_at) for tracking the splitting method used, making reassembly possible.
Diagnostic Logging:
- Writes the split chunks to a file (chunk_input.txt) in repr() format for debugging.
- Logs each chunk input and its corresponding AI response in a separate diagnostic file (chunk_log.txt).

API Limits:

The function relies on the OpenAI API responding consistently for all chunks. Unexpected errors (e.g., timeouts) are not explicitly handled, which could cause the program to terminate prematurely. It doesn’t absolutely check for context length before sending, or improve upon errors.

Improvements: this ideally would be written where progress could be resumed if there is a complete crash.

jlvanhulst · January 15, 2025, 12:20am

I ran your code on some of our Fireflies transcripts.

Here are my thoughts. I agree with the AI generated help on chunking. At least you need to make sure that chunking happens so that a new ‘page’ always starts with a speaker.

Your prompt part that mentions ‘Speaker 1’ is going to cause problems on transcripts that HAVE speaker names, running it several times on different models results in sometimes going with speaker 1 2 3 etc eventhough there are speaker names in my transcript. If you leave that out completely - it should work just fine. It will leave either speaker names or speaker 1 etc.
I would also remove the whole first part about paragraphs.

Now my biggest recommendation in this case would be to simply use Google Gemini for this, super simple update of the code and has a huge context window. But price might be too high? It is so much easier and faster for this job.

BUT I would say you can completely avoid all of this by simply using Google Gemini (1.5). I ran your episode 253 on that without any problem. Send me an email and I will share the outputs from both my local test and the Gemini output.

mcmase1212 · January 15, 2025, 8:57pm

I really appreciate everyone’s time.

Looking back at this today, a couple of thoughts.

I still struggle to identify the cause of the issue, since the program ran very well at first. Maybe prompt fatigue was mis-identified as the cause, but if so, then what is the cause?
I will experiment with prompting. Also I do see the discussion about breaking the text into smaller chunks, or at least semantically, but I think that’s overdesigning. I wanted a quick code/prompt that’s 99% accurate. I have to review regardless of how accurate it is, so 1 mistake per page due to poor chunking is still within design requirements.
The formatting stuff is because whisper outputs a line per 2 seconds of audio. So sometimes there’s 1 word, sometimes 10. Depends on how fast the speaker is talking. Also there’s no speaker tags or other identification from the initial transcript. But overall, that’s a much less timesave than the revisions. Probably should break into a separate API call with a different prompt first.

I’m not new to coding, but I’m new to python and AI APIs. So maybe my entire approach is wrong… so to revisit my design requirements…

DESIGN REQUIREMENTS:
Listen to an audio file and output a transcript that is 99% accurate in content, but excluding repeated words, stutters, filler words, and interjections.
Ideally also assist with formatting and speaker tags.

I don’t really have the option with my company to use expensive or subscription software that specializes in this (yet) but I calculated that OpenAI could do one for about $0.05 and that was a go.

My current solution:
1-call Whisper to perform the intial transcription.
2-initial transcript is “too accurate” and includes every word and stutter.
3-in a new script, call gpt-3.5-turbo to perform transcript cleanup.
4-(TBD) I should make a second step for formatting.

Trying to fix step 3, but also, maybe my entire process is sub-optimal and there’s a much better way to do this. Open to suggestions either way. Will experiment with prompting and breaking the text into smaller pieces (or potentially running calls asynchronously).

But, at this point, I’m getting out on a path where I’m investing a ton of time into the coding that if i don’t get it to work, I won’t earn my lost time back. I could probably perform step 3 manually in 1-2 hours. My hope if this worked perfectly is to get the entire process to 30-40 minutes.

dlaytonj2 · January 25, 2025, 11:40pm

I don’t experience prompt fatigue, but I took a different approach to repetitive prompts on large volume of text – I used the Batch API. This lowers the cost of processing significantly although it may take minutes rather than seconds to do the same task.

There is a learning curve to using the Batch API, but once you create your first program, it’s easy to adapt it to other use cases. Let me know if you’d like any sample code.

Just a thought, hope it helps.

Topic		Replies	Views
New 4-turbo model has a unique limit? Or is this a bizarre hallucation? API	18	4737	January 26, 2024
Is the GPT4 api actually this limited or am I doing something wrong? API	12	1674	September 17, 2023
Persistent Truncation Issues with GPT-4o-Transcribe – Has Anyone Fully Solved This? API gpt-4 , api , transcribe , gpt-4o , api-realtime	11	1567	July 30, 2025
How to confirm that you got the correct value from a text other than repeating the same prompt over and over API	39	1593	September 1, 2024
Poor quality response on trained LLM with pdf files Community gpt-4	29	7226	May 1, 2024