Hi Everyone,
I’m building a real-time AI voice assistant using OpenAI’s Realtime Voice API. The assistant reads questions from a structured JSON, asks them to the caller, and collects/validates responses based on rules defined in that same JSON.
I’m experiencing a persistent and reproducible issue where the model autonomously adds or removes digits from phone numbers to match an expected format — even when explicitly instructed not to do so.
Expected behavior
When a caller provides a phone number (valid or invalid), the model should:
-
Capture the digits exactly as heard and pass them to the tool call
-
The internal validation logic checks the value — if invalid, it returns a validation error message to the model
-
The model reads the error back to the caller (e.g. “Invalid phone number format. Phone number must be 10 digits: 123-123-1234”)
-
The model re-collects the caller’s response and again passes only what was literally heard — no modification
How the internal validation logic works
The flow is designed as follows:
Caller speaks phone number
↓
Model makes tool call with captured digits
↓
Internal logic checks digit count against expected format (10 digits)
↓
[Valid] → Proceeds to next question
[Invalid] → Returns ValidationError:
"Invalid phone number format.
Phone number must be in the following format:
10 Digit - 123-123-1234"
↓
Model receives ValidationError and reads it back to caller
↓
Model re-asks for phone number
↓
Caller provides new input → Model should relay EXACTLY as heard (back to step 1)
The validation responsibility is entirely on the internal system — the model’s only role is to act as a neutral relay: capture digits as spoken and pass them through untouched. It should never attempt to interpret, reformat, or correct a value before or after a validation error.
Actual behavior
The model is auto-correcting the digit count to satisfy the expected 10-digit format:
-
If the caller provides 9 digits, the model adds 1 digit (sourced from within the provided number)
-
If the caller provides 11 digits, the model truncates 1 digit
This happens:
-
Sometimes on the first attempt, before any validation error is returned
-
Sometimes after a validation error has already been read back to the caller
Reproducible examples
Example 1 — Auto-correction on first attempt
Model: “May I have your 10-digit phone number please?”
Caller:"67893 2038"← 9 digits
Tool call value:678-932-0288← 10 digits (model added8, which already appeared in the number)
Example 2 — Auto-correction after validation error
Model: “May I have your 10-digit phone number please?”
Caller:"638-590-581"← 9 digits
(Validation error returned and read to caller)
Model: “There seems to be an issue with the phone number format. Please say your 10-digit phone number, including the area code, like 123-123-1234.”
Caller:"638-590-581"← same 9 digits again
Tool call value:638-509-0581← 10 digits (model inserted0)
Prompt I’ve tried
IMPORTANT RULES FOR NUMERICAL RESPONSE:
1. Act as a literal, non-judgmental data-entry clerk.
2. For phone numbers, zip codes, OTPs, IDs, or any numeric fields, capture ONLY the raw digits heard; NEVER add or remove digits.
3. ValidationError is a SYSTEM MESSAGE for the caller only — NOT an instruction for you to fix the value.
4. After a ValidationError: read it to the caller, capture their new response, and send it EXACTLY as heard.
5. Validation is the SYSTEM's responsibility. Your only job is to capture and relay digits.
6. Return the value exactly as received in the transcript. No interpretation. No correction.
I’ve tested multiple variations of this prompt, and also tested with no prompt at all. Interestingly, the auto-correction behavior occurs less frequently without a prompt — suggesting the model may be interpreting the instructions as implicit permission or guidance to “fix” invalid inputs.
Questions
How can I ensure that the model:
-
Never modifies numeric input
-
Does not auto-correct phone numbers
-
Acts strictly as a pass-through capture system for digits
Is there:
-
A specific parameter/config to disable normalization?
-
A recommended pattern for strict digit fidelity in realtime voice flows?
Any suggestions, best practices, or similar experiences would be greatly appreciated.
Thanks in advance!