since this is a AMA for the API team, I wonder, do you guys have plans to release documentation examples of frontend microphone usage on different languages and frameworks that work seamlessly as examples on any github repository?
i feel like that would help many build more reliable features with faster integration
We’re seeing a lot of developers building with the Assistants API, and we’re continuing to invest in the tools developers need to build agents. Next year, we plan to bring o1 to the Assistants API and will have more to share soon!
Ooh. I would love to automate building relaxing, tranquil looping background videos to go alongside my custom-made music.
I’d also like to build N videos for a prompt and be able to approve them for future stock videos. Maybe even incorporate a vision model to somehow rank them before being sent to me for approval
Can’t wait for O1 with vision!
Really think ANY business can profit from this, as all of them use badly formatted / scanned in / manually written on PDFs.
Looking forward to replace a pipeline of 20 LLM calls to a single call.
That leads me to my question:
with the release of better models (gpt3 > gpt4 > gpt4o > o1), do you see a spike in reduced overall tokens used via the API as people replace longer prompts or multiple inference call with simpler prompts or less inference calls?
More generally, what are we as developers not doing as much as you think we should? What do you wish we did differently, or more or less of? We take constructive criticism too
Nothing to share yet on V3 Whisper in the API. But for both audio understanding and TTS, do check out the new GPT-4o mini audio preview model. It’s got state of the art speech understanding and you can prompt the model directly to control how it hears and speaks! For example, give it a prompt like "Say the following in a somber tone, and make sure to pause your speech appropriately: "
If you wish to discuss a response in detail, please create a new thread with a link to the reply in the body using the icon to save filling the AMA thread.
One more for the assistants API. It would be really great to have the realtime api able to interact with assistants. That would give really cool and tailored interactive scenarios to users
It’s something we care about! Giving the model more context and examples is a great way to get smarter responses. Nothing to announce just yet but stay tuned in 2025
Im sorry if this question has been asked already, and I absolutely appreciate you all and the accomplishment you have attributed to so many communities and the world as a whole. However I’m wondering if o1 is now available in Playground under Chat? Because at this time it’s only showing o1 preview still.
Is an ephemeral API key subject to abuse? For example, could a bad actor scrape a website every minute to keep getting new ephemeral keys to use for their own purposes?
I’ve got an application we’ve developed on the Assistants API, primarily because of the ease of use on the knowledge base and uploading files. Would it be possible to replicate the functionality of that with the new function calling feature, allowing for use of the newest features like fine tuning and realtime voice with this endpoint, while still keeping the ability to do RAG for context?
It’s definitely worth retrying! Both GPT-4o and 4o mini have improved meaningfully in multi-lingual understanding with the latest snapshots. We still use the same Whisper model to transcribe what the user said, but then GPT-4o processes the audio directly and responds (without going through a transcription). Would love to hear what you find?
Interesting question, and not one I've encountered before! It really depends on your schema and what you're looking for. You may want to consider a different schema for your second response, which has something like a "continuation" key to make it clear to the model that it is supposed to add to its earlier response.
When can we expect the voices from advanced mode will be made available to the API. Right now they are two different voice. And when will you make available tools to control the voice emotion, tone, innotation, etc?