Assembly is a great start. It’ll distinguish between Speaker A and Speaker B and so on. If you want something more specific (i.e. actual names of the speakers) then you may implement something like this…
1. Call an LLM API to look at the transcript from Assembly, identify each each speaker by name, and return the result as a JSON object with keys ‘Speaker A’, ‘Speaker B’, ‘Speaker C’, etc., and their corresponding full names as values.
This function does the following:
a) Prepares a message for the LLM with instructions and the transcript.
b) Sends the message to the LLM API (Claude in this case).
c) Receives the response containing speaker identifications.
def identify_speakers(transcript_text):
messages = [
{
“role”: “user”,
“content”: f"““Based on the following transcript, identify all speakers by their full names.
Return the result as a JSON object with keys ‘Speaker A’, ‘Speaker B’, ‘Speaker C’, etc., and their corresponding full names as values.
If you cannot confidently identify a speaker, use ‘Unknown’ as the value.
Include all speakers mentioned in the transcript, even if there are more than two.
Transcript:
{transcript_text}””"
}
]
response = client.messages.create(
model=“LLM of your choice”,
max_tokens=whatever,
messages=messages
)
2. Then python to parse the JSON:
This:
a) Extracts the text content from the response.
b) Attempts to parse it as JSON.
c) If parsing fails, it tries to find a JSON-like structure in the content.
d) Returns the parsed JSON or an error message if parsing fails.
content = response.content
if isinstance(content, list) and len(content) == 1 and hasattr(content[0], ‘text’):
content = content[0].text
try:
return json.loads(content)
except json.JSONDecodeError:
json_match = re.search(r’{.*}', content, re.DOTALL)
if json_match:
try:
return json.loads(json_match.group())
except json.JSONDecodeError:
pass
return {“error”: “Could not parse content”, “raw_content”: str(content)}
3. Then define a function to use the parsed JSON to replace speaker labels with names and return an updated transcript
This process:
a) Defines a function to replace speaker labels with full names.
b) Combines the AssemblyAI transcript utterances into a single string.
c) Calls the identify_speakers function to get the speaker mapping.
d) Uses the replace_speaker_labels function to update the transcript with full names.
def replace_speaker_labels(transcript_text, speaker_mapping):
for speaker_label, speaker_name in speaker_mapping.items():
transcript_text = transcript_text.replace(speaker_label, speaker_name)
return transcript_text
Get the full transcript text
full_transcript = “\n”.join([f"Speaker {u.speaker}: {u.text}" for u in transcript.utterances])
Identify speakers
speaker_mapping = identify_speakers(full_transcript)
Replace speaker labels with names
updated_transcript = replace_speaker_labels(full_transcript, speaker_mapping)