Training on transcribed audio output - How do I make the AI know who said what?

AlexTully · May 20, 2022, 12:28pm

I asked this by private chat, but then I saw this forum and realised that the answer to this might be relevant to many other users, so I thought I’d post it here too.

Here’s my situation: I have about 10,000 text files containing transcriptions of things that my students have said. Each file corresponds to what one student has said on one particular day. Anyway, I’d like to train an AI model on this data to answer queries of the following forms:
• Give me a sentence that might say. This sentence must contain .
• Give me a sentence that might say about . This sentence must contain
is some item of language, e.g. a sequence of words such as “didn’t have to”, or a grammatical structure such as a second conditional.

For open ended generation, the manual recommends leaving the prompt empty. So my training data would look something like:
{“prompt”:"", “completion”:" “}
{“prompt”:”", “completion”:" "}
But where would I put in which student said which sentence? In all of the transcriptions, each student refers to themselves as “I” or “me”, so it’s impossible to know which students have said which sentences. For example, Student A’s transcription might say “I don’t drink because I’m worried about my health.” whereas Student B’s says “I go drinking every weekend.”. It would be problematic if the AI generated output where Student A is talking about his love of alcohol, or where Student B is talking about being teetotal. So I think the training data definitely needs to include each student’s name alongside what they said. My question is how do I do this?

Should I do something like the below?
{“prompt”:"", “completion”:" “Student A: I don’t drink because I’m worried about my health.”}
{“prompt”:"", “completion”:" “Student B: I go drinking every weekend.”}

Or should I put the student’s names as the prompts?
{“prompt”:“Student A”, “completion”:" “I don’t drink because I’m worried about my health.”}
{“prompt”:“Student B”, “completion”:" “I go drinking every weekend.”}

Or should I do something different?

TLDR: What’s the best way to indicate the name of the speaker in training data that consists of transcribed audio output.

Thanks in advance,

Alex

daveshapautomator · May 20, 2022, 2:09pm

You would need to add a layer of speaker identity recognition. See the following:

AlexTully · May 20, 2022, 11:03pm

Thanks for the reply. But I already know each speaker’s identity. Each file of transcribed text only contains output from one speaker. My question is how to include the speaker’s identity in the training data.

Topic		Replies	Views
How to train davinci to be person with the name Alex Prompting	8	1407	November 8, 2022
I want the AI to be in a position to listen and collect data on group members' conversations Prompting chatgpt	3	297	July 23, 2024
Indicating text type in data for open-ended text generation Prompting	2	654	May 29, 2022
How to include general instructions in a jsonl file for use in fine-tuning the OpenAI Davinci model? API	5	1027	December 25, 2023
How do I tell my davinci what it has to be/how it should act? Prompting chatgpt	1	584	May 2, 2023

Training on transcribed audio output - How do I make the AI know who said what?

Related topics