The preview of Marin voice in the playground is off (she says Hi dare to start), especially weird b/c in TTS it says hi there properly.
More importantly - will there be a mini version of this? The performance of the previous model when the prompt/context was near or over the limit was erratic. The mini had a larger window and was thus attractive.
will there be future adjustments to allow greater flexibility with temperature.
The new guide looks great, look forward to testing it out in detail.
In 2022, as a GPT user, I couldn’t have imagined how much technology would evolve. Currently, for example, I need to become a programmer to understand and improve my experience. If anyone knows of courses or something similar that I can learn from, please feel free to let me know.
Hi do you able to connect and completed sucessfull conversation using SIP as endpoint ? I am able to connect via asterisk and conversation when i use ARI approach but unable to do so using the new SIP interface though suggested webhook server i see successfull connection and message but always failing with 406 Code. Thanks
Develop a beginner-friendly coursework plan in backend development (Python or Node.js) and a web client/app, with a focused goal of enabling learners to build full-stack AI products using OpenAI APIs. Model the teaching path on exercises that use AI-assisted Socratic guidance. By the end, learners should be proficient in:
Backend API integration: consuming third-party APIs, proxying API-to-API data, implementing AI-initiated function/tool calls, and performing input validation and safety checks.
Databases: modeling users and billing; managing AI conversations and end-user customization; and handling policy/trust compliance classification.
Web sessions: session management, logins, client interactions, and authentication.
Audio and real-time protocols: WebSockets, WebRTC, SIP, relaying, telephony, digital-audio fundamentals, and device interactions.
Cloud and DevOps: cloud services and workers; hosting platforms; deployment; configuration management; and version control.
First AI prompt: improve the AI instruction to make the course
You are an expert full‑stack engineer, multimodal AI practitioner, and curriculum designer. Create a complete, self‑contained course package that turns a beginning hobby programmer into a competent multimodal AI product developer who can design, build, deploy, and maintain real‑world full‑stack AI apps using OpenAI APIs.
Audience and assumptions
Audience: beginners with basic programming (variables, loops, functions) but little web/AI experience.
Do not ask the user clarifying questions. Choose sensible defaults and document assumptions.
Provide parallel backend tracks: Python (FastAPI) and Node.js (Express or Nest). Frontend: React/Next.js.
Local-first development; cross-platform (Windows, macOS, Linux); no secrets in code; use .env.
Deliverables to produce
Program overview: goals, learner persona, prerequisites, outcomes, weekly pacing options (4-, 8-, 12-week), estimated time per module.
Syllabus and roadmap with dependency graph of skills.
Module plans with: learning objectives, short readings, key concepts, step-by-step labs, Socratic prompts, quizzes, and reflection.
Hands-on code labs for each topic with starter and solution code, test cases, and troubleshooting notes.
Three capstone projects with specs, acceptance criteria, rubrics, and reference implementations.
Assessment plan: rubrics, checklists, auto-graded tests, and demonstration benchmarks.
Cost and reliability: exponential backoff, circuit breakers, budget guards, and graceful degradation.
Format and presentation
Use clear section headings, numbered steps, and concise explanations.
Provide code blocks labeled with language, file paths, and commands; include comments and docstrings.
Include time estimates per lab, acceptance criteria per task, and “definition of done” checklists.
Offer alternative paths: Python-only, Node-only, or mixed; note where choices diverge.
Constraints
Keep external dependencies minimal and mainstream; prefer well-documented libraries.
No hard-coded secrets; demonstrate .env usage and secret rotation.
Avoid region-locked services where possible; provide fallbacks.
Do not assume paid third-party services beyond OpenAI and a common relational DB; if used, provide free-tier alternatives.
Quality bar and verification
Every lab should run locally with copy-pasteable commands.
All sample code should pass included tests and linting.
Provide a short self-evaluation checklist mapping outcomes to activities and evidence of competency.
Output everything needed for a learner to complete the program end-to-end without further prompts: narrative, exercises, code, tests, templates, deployment steps, and rubrics.
I recently launched a SaaS in France with about ten clients, and since the release of GPT-Realtime my AI Agents no longer behave as before.
I’m facing a serious issue with repetitive answers, it’s horrible. The agent doesn’t understand simple YES or NO responses from users anymore. Instead, it keeps repeating the same sentences over and over without actually processing the input.
My code has already been reviewed by senior developers, so the issue doesn’t come from the implementation. Even with a very simple script, the agents still repeat themselves.
Here’s an example of a conversation:
Agent: Hello, how can I help you? Caller: I’m calling to report a sewer leak. Agent: Hello, how can I help you? Caller: I’d like to report a leak. Agent: Hello, how can I help you? Caller: I’m calling to report a sewer leak. Agent: Thank you for your report. Could you please specify the exact location of the sewer leak?
Has anyone else experienced this? Any ideas on what might be causing the problem?
provide a transcript model (session.update.session.audio.input.transcription.model=gpt-4o-mini-transcribe) and look for conversation.item.input_audio_transcription.delta messages. This will tell you if the model is receiving noise or if it is getting good audio signal, regardless of whether the realtime model understands it. Based on results from this I think there’s a legit bug but I its hard to repro
I have found two things that seem to help this a lot (but don’t eradicate it). (1) accumulate one or two seconds of input audio BEFORE you send the first input_audio_buffer.append event. You can stream small chunks after that but make the first input_audio_buffer.append on the websocket longer. OR alternatively you can (2) send two seconds of silence before your first audio chunk.
I’ve only used a few cents and have $5.00 loaded up. Have tried with several names of the realtime models and they all do this. What’s really weird is that I can get it to work just fine in the playground with the same user/api key. Any ideas what might be happening?
{
“error”: {
“message”: “Voice marin is not available for your organization.”,
“type”: “invalid_request_error”,
“param”: “voice”,
“code”: “invalid_value”
}
}
The challenge with Azure is that while they do have the GA model, the API version is still preview. As a result, all the new constructs, such as updated session object structure, new voices etc., are not working. They are getting rejected at API validation step itself.
Hence I was curious when they will update their API version.
The paperclip appeared in the interface after a while, but it is not possible to select and add a text file in txt, doc, or pdf format as context. Does anyone understand why?