How to summarize a large transcript file

Hi @sohailahmad0757!

Is your goal to create a single summary of the full conversation flow, i.e.:

AI Teacher: “Can you tell me what the biggest planet in our solar system is?”
Student: “Umm, I think it’s Jupiter!”
AI Teacher: “That’s correct! Jupiter is the largest planet in our solar system. Now, do you remember which planet is closest to the Sun?”
Student: “Oh, that’s Mercury!”
AI Teacher: “Great job! Mercury is the closest planet to the Sun. Here’s a tricky one: which planet is known as the ‘Red Planet’ because of its color?”

If yes, then basically you need to include the full conversation flow in one line. This could look as follows (illustrative only - the user message is abbreviated in this example):

{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o", "messages": [{"role": "system", "content": "you need to create a summary of the conversation between the student and AI (Teacher)"},{"role": "user", "content": "Conversation: AI Teacher: Can you tell me what the biggest planet in our solar system is?, Student: Umm, I think it's Jupiter!, AI Teacher: That's correct! Jupiter is the largest planet in our solar system. Now, do you remember which planet is closest to the Sun?, Student: Oh, that's Mercury!, AI Teacher: Great job! Mercury is the closest planet to the Sun. Here's a tricky one: which planet is known as the 'Red Planet' because of its color? ... "}],"max_tokens": 1000}}

Now you said at the beginning that you have a large transcript file that exceeds the token limit. So this is where chunking comes into play. You need to chunk your conversation flow. Then each chunk of the conversation is included in one line of the batch request with a similar approach as above.

Note that each line is executed as a separate API call. There is no dependency between two lines and the content from the conversation in one line is not considered in the next line of your batch.

Once the batch job has been completed, you can concatenate the individual summaries into a complete summary. This may still require another API call to ensure the summaries are well connected, language aligned etc.

3 Likes