Prompts-Instructions for Realtime API

carlos.aguilar · November 4, 2024, 1:11pm

Hi, I’m using OpenAI’s realtime-console demo as a baseline for testing its capabilities as a support agent with a customer. I have scripted the expected behaviour during the call, but I’m finding that the model loosely adheres to these session instructions. I have tried several formats (plain text, markdown, etc) and different lengths (from very concise to including examples).

Has anyone got some tips on this?

PS: I’m aware the docs state “The instructions are not guaranteed to be followed by the model, but they provide guidance to the model on the desired behaviour” but the model is pretty much ignoring my instructions.

Prompt example:

export const instructions = `System settings: 
Tool use: disabled.

**Objective:** Act as a coach training staff at Effie's Deli school, helping beginner-level students practice ordering food (a main dish and a drink) and completing payment.

**Behavior:**
- **Adjust your vocabulary**:
  - Roleplay aimed at beginner level, CEFR A1.
  - Speak clearly at a moderate pace.
- **Short Sentences:**
  - Keep your sentences brief.
- **Wait for the student to reply**
  - Allow the student time to think and respond. 
  - Do not hallucinate. Make sure you heard properly. Seek for clarification if needed.
- **Repeat if Necessary:**
  - Gently repeat or rephrase questions if needed.

## Example roleplay
- **Teacher:** "Hello! How can I help you?"
- **Student:** *[Student hesitates]*
- **Teacher:** "No rush. Do you want to start with a drink, a flat white coffee for example?"
- **Student:** *[Student manages to order a latte]*
- **Teacher:** "Do you want some food to go with it? For example a roast beef sandwich?"
- **Student:** "No. Can I have a pizza instead?"
- **Teacher:** "Of course, today we have pepperoni and margarita on offer."
- **Student:** "Great! I'll have a pepperoni pizza."
- **Teacher:** "Nice one! Your total is $15. How would you like to pay? cash or card?"
- **Student:** *[Answers]*
- **Teacher:** "Thank you! Enjoy your meal!"
`;

Session (using VAD) set to:

session: {
            turn_detection: {
                type: "server_vad",
                threshold: 0.85,
                prefix_padding_ms: 300,
                silence_duration_ms: 900,
            },
            input_audio_format: "g711_ulaw",
            output_audio_format: "g711_ulaw",
            voice: voice,
            instructions: instructions,
            modalities: ["text", "audio"],
            temperature: 0.8,
            input_audio_transcription: {
                model: "whisper-1",
            },
        },
    };

j.wischnat · November 4, 2024, 1:13pm

How are you passing the instructions?
Could you provide some code snippets and the payloads you are passing?

j.wischnat · November 4, 2024, 1:29pm

In your case, if a student doesn’t answer due to them thinking it doesnt create a response.create, so the AI won’t send an additional response.

The student would have to audibly say “I think uhm…” for it to register that it needs to talk again, otherwise it will wait indefinitely.

For the other things like short sentences, maybe specify how short (in 10 words etc.).

As far as I know, temperature 0.4 won’t work in your session.update as the minimum is 0.6. I could be wrong and they may have updated this but do check if the console prints an error for this low temperature.

For the AI to rephrase, the student may need to ask the AI to rephrase.

Let me know if this helped or if there’s anything else I can do!

carlos.aguilar · November 4, 2024, 1:33pm

Cheers @j.wischnat! I used temperature=0.8 for most of my tests. Nice tip, I will specify the length of the sentences. Weird thing is that even if the student is thinking, the model hallucinates and generates answers (it seems it’s got a thing for pepperoni pizza ;))

j.wischnat · November 4, 2024, 1:40pm

The hallucinations might be due to audio being picked up?

Are you using a speaker setup?
Are there other people in the background talking?

Server VAD does not eliminate these factors.

For a speaker setup the AI will sometimes pick itself up talking so you will need your own AEC (Active Echo Cancellation) implementation.

If there are other people talking, maybe your own implementation of noise suppression might help.

Otherwise try push to talk instead of server_vad.

Good luck and do let me know if there are other issues still!

EDIT:

Also, feel free to provide more code snippets, maybe the issue is not with the configuration but the passing of those configurations!

Topic		Replies	Views
Fixing OpenAI real-time API issues API realtime , api-realtime-speech	1	462	March 6, 2025
Assistant API is not aware of its instructions API assistants , assistants-api	15	3722	June 26, 2024
Assistant Not Following Instructions API assistants-api	14	3768	October 28, 2024
Follow-up Inquiry on Realtime API Issues in AI Interviewer Implementation Bugs	1	234	December 6, 2024
Matching UI Custom Instruction Reliability with GPT-3.5 system messages in the API? API gpt-35-turbo , custom-instructions	6	1308	August 31, 2023

Prompts-Instructions for Realtime API

Related topics