When should we expect price reductions for whisper and update for whisper v3 (which you guys launched more than a year ago) on the API? Please send whisper some love.
Any plans to make prompt caching better? Specifically, allow specification of what content to cache, give it a variable name, and allow reference to the cache in subsequent API calls for a certain time duration. I know another company is doing this
Great updates on Realtime API.
- Any timeline as to when the issue with the realtime model dropping out of voice mode when reconstructing conversation history is going to get resolved?
- Also the audio getting randomly cutoff at the end from time to time.
These two are probably the most critical issues of the API right now.
Well we got: o1 w vision via api (w reasoning guide params) I’m happy :') ty guys
We don’t have plans for a Sora API yet, but we’d love to hear more! What will you build if we ship one?
since this is a AMA for the API team, I wonder, do you guys have plans to release documentation examples of frontend microphone usage on different languages and frameworks that work seamlessly as examples on any github repository?
i feel like that would help many build more reliable features with faster integration
We’re seeing a lot of developers building with the Assistants API, and we’re continuing to invest in the tools developers need to build agents. Next year, we plan to bring o1 to the Assistants API and will have more to share soon!
Ooh. I would love to automate building relaxing, tranquil looping background videos to go alongside my custom-made music.
I’d also like to build N videos for a prompt and be able to approve them for future stock videos. Maybe even incorporate a vision model to somehow rank them before being sent to me for approval
Can’t wait for O1 with vision!
Really think ANY business can profit from this, as all of them use badly formatted / scanned in / manually written on PDFs.
Looking forward to replace a pipeline of 20 LLM calls to a single call.
That leads me to my question:
with the release of better models (gpt3 > gpt4 > gpt4o > o1), do you see a spike in reduced overall tokens used via the API as people replace longer prompts or multiple inference call with simpler prompts or less inference calls?
What is OpenAI’s view on Model Context Protocall?
It would be great if everyone is on the same page about how we’ll build external connections to OpenAI APIs going forward.
Ideally you guys would come out and support it.
More generally, what are we as developers not doing as much as you think we should? What do you wish we did differently, or more or less of? We take constructive criticism too
For Sora? o1 story or multiple scenes + chain of requests to Sora API
Nothing to share yet on V3 Whisper in the API. But for both audio understanding and TTS, do check out the new GPT-4o mini audio preview model. It’s got state of the art speech understanding and you can prompt the model directly to control how it hears and speaks! For example, give it a prompt like "Say the following in a somber tone, and make sure to pause your speech appropriately: "
If you wish to discuss a response in detail, please create a new thread with a link to the reply in the body using the icon to save filling the AMA thread.
One more for the assistants API. It would be really great to have the realtime api able to interact with assistants. That would give really cool and tailored interactive scenarios to users
Where is the code that Sean DuBois showed for weRTC client?
It’s something we care about! Giving the model more context and examples is a great way to get smarter responses. Nothing to announce just yet but stay tuned in 2025