Prompts-Instructions for Realtime API

Hi, I’m using OpenAI’s realtime-console demo as a baseline for testing its capabilities as a support agent with a customer. I have scripted the expected behaviour during the call, but I’m finding that the model loosely adheres to these session instructions. I have tried several formats (plain text, markdown, etc) and different lengths (from very concise to including examples).

Has anyone got some tips on this?

PS: I’m aware the docs state “The instructions are not guaranteed to be followed by the model, but they provide guidance to the model on the desired behaviour” but the model is pretty much ignoring my instructions.

Prompt example:

export const instructions = `System settings: 
Tool use: disabled.

**Objective:** Act as a coach training staff at Effie's Deli school, helping beginner-level students practice ordering food (a main dish and a drink) and completing payment.

**Behavior:**
- **Adjust your vocabulary**:
  - Roleplay aimed at beginner level, CEFR A1.
  - Speak clearly at a moderate pace.
- **Short Sentences:**
  - Keep your sentences brief.
- **Wait for the student to reply**
  - Allow the student time to think and respond. 
  - Do not hallucinate. Make sure you heard properly. Seek for clarification if needed.
- **Repeat if Necessary:**
  - Gently repeat or rephrase questions if needed.

## Example roleplay
- **Teacher:** "Hello! How can I help you?"
- **Student:** *[Student hesitates]*
- **Teacher:** "No rush. Do you want to start with a drink, a flat white coffee for example?"
- **Student:** *[Student manages to order a latte]*
- **Teacher:** "Do you want some food to go with it? For example a roast beef sandwich?"
- **Student:** "No. Can I have a pizza instead?"
- **Teacher:** "Of course, today we have pepperoni and margarita on offer."
- **Student:** "Great! I'll have a pepperoni pizza."
- **Teacher:** "Nice one! Your total is $15. How would you like to pay? cash or card?"
- **Student:** *[Answers]*
- **Teacher:** "Thank you! Enjoy your meal!"
`;

Session (using VAD) set to:

session: {
            turn_detection: {
                type: "server_vad",
                threshold: 0.85,
                prefix_padding_ms: 300,
                silence_duration_ms: 900,
            },
            input_audio_format: "g711_ulaw",
            output_audio_format: "g711_ulaw",
            voice: voice,
            instructions: instructions,
            modalities: ["text", "audio"],
            temperature: 0.8,
            input_audio_transcription: {
                model: "whisper-1",
            },
        },
    };
2 Likes

How are you passing the instructions?
Could you provide some code snippets and the payloads you are passing? :hugs:

1 Like

In your case, if a student doesn’t answer due to them thinking it doesnt create a response.create, so the AI won’t send an additional response.

The student would have to audibly say “I think uhm…” for it to register that it needs to talk again, otherwise it will wait indefinitely.

For the other things like short sentences, maybe specify how short (in 10 words etc.).

As far as I know, temperature 0.4 won’t work in your session.update as the minimum is 0.6. I could be wrong and they may have updated this but do check if the console prints an error for this low temperature.

For the AI to rephrase, the student may need to ask the AI to rephrase.

Let me know if this helped or if there’s anything else I can do! :hugs:

Cheers @j.wischnat! I used temperature=0.8 for most of my tests. Nice tip, I will specify the length of the sentences. Weird thing is that even if the student is thinking, the model hallucinates and generates answers (it seems it’s got a thing for pepperoni pizza ;))

The hallucinations might be due to audio being picked up?

Are you using a speaker setup?
Are there other people in the background talking?

Server VAD does not eliminate these factors.

For a speaker setup the AI will sometimes pick itself up talking so you will need your own AEC (Active Echo Cancellation) implementation.

If there are other people talking, maybe your own implementation of noise suppression might help.

Otherwise try push to talk instead of server_vad.

Good luck and do let me know if there are other issues still! :hugs:

EDIT:

Also, feel free to provide more code snippets, maybe the issue is not with the configuration but the passing of those configurations! :thinking:

2 Likes