AMA on the 17th of December with OpenAI's API Team: Post Your Questions Here

sherwinwu · December 17, 2024, 6:56pm

We’ve heard a lot of demand for search and browse capabilities in the API – especially from folks who want to build applications that are grounded in super recent information. We’re actively working on this and expect to launch something next year!

lukeduncan333 · December 17, 2024, 6:57pm

When will we be able to fine-tune voice models?
Is this something in the works or on the roadmap?

pierlucgagne92 · December 17, 2024, 6:57pm

Hey OAI great release today!
For O1, is there a roadmap to achieve in-COT function calling to supplement the reasoning context? Is this more of an Assistant update or is this the super secret Agents project?

sherwinwu · December 17, 2024, 6:57pm

We are always working on decreasing the prices of our models – so I would expect more here over time as well! I would also recommend trying the new 4o-mini realtime model we launched today, it’s quite a bit cheaper too.

misreadable_mind · December 17, 2024, 6:57pm

Thanks for the GO and Java SDK’s. Any updates for Official Open AI PHP SDK?

brianz-oai · December 17, 2024, 6:57pm

Yes, you should be confident building with client.beta.chat.completions.parse for Pydantic Structured Outputs!

Also yes, we are working on bringing structured outputs (and also other useful API features) to all reasoning models, including the o-mini series. Stay tuned.

sherwinwu · December 17, 2024, 6:58pm

If you’re referring to the “generate” feature that auto-generates a json schema for you – this is just a simple call to our chat completions API with a specific prompt! You can actually see the prompt we use in our documentation: https://platform.openai.com/docs/guides/prompt-generation#meta-schemas

anon25271712 · December 17, 2024, 6:58pm

Congrats! that’s amazing, I was super happy with the chart that was shown today about o1’s structured outputs performance!!

oh, and thank you so much for the reply!

jeffsharris · December 17, 2024, 6:59pm

Today, the best way is with a tool call that you use to trigger o1 (probably using the new out of band conversation feature). We’ll keep investing in making it easier to use more intelligence within the Realtime API

Re video. Stay tuned it’ll come next year. The model has been trained for this capability so in the mean time, you can experiment with it in ChatGPT

dvd14986 · December 17, 2024, 6:59pm

There is any chance to get the audio I/O capability in assistant api?
Will be very useful to get the possibility to send audio to specific assistant and get audio+text as output. Realtime capability will be great too, ma the basic multimodal will simplify a lot.

EricGT · December 17, 2024, 6:59pm

Any new tools like Neuron Viewer?

Any chances Neuron Viewer can be updated with a newer model?

arieltro78 · December 17, 2024, 6:59pm

Why realtime api sounds so horrible in other languages than english against the audio to audio advanced voice mode in chatgpt app?

jeffsharris · December 17, 2024, 7:00pm

Nothing to share just yet. We focused the first pass of prompt caching on making it as easy as possible to use (no API changes needed, no extra cost for cache writes). It’s a feature we care a lot about though … would be curious how you’d want to use a more structured cache?

Sean-Der · December 17, 2024, 7:00pm

The demo itself used a https://sonatino.com. You could use any ESP32-S3 though!

It was connected to Sennheiser Enterprise Solution SP 20 ML that I got off eBay. You can just use any headphones though.

The stuffed toy is Record Your Own Plush 16 inch Reindeer - Ready 2 Love in a Few Easy Steps - Walmart.com

jeffsharris · December 17, 2024, 7:00pm

We made a bunch of improvements to audio truncations with the new models… would be curious if you still see problems there?

It’s still possible to get the model stuck in text only mode if you give it a huge amount of text upfront. Known issue that we’ll keep improving in future releases. In the mean time, putting audio in the latest user turns can help the model rediscover its voice

misreadable_mind · December 17, 2024, 7:01pm

Does Prompt-Engineering techniques like using delimiters/json as input or MD as input add to the inference quality over API?

P0mme · December 17, 2024, 7:01pm

Thanks for your response, it is much appreciated!

I’ll give a try to the new mini with audio capabilities.

Do you guys have benchmarks showing how good it is for function calling? Our product is based around that so if the mini is less accurate, we’d have to stick to the full model for now.

Thanks again

Sean-Der · December 17, 2024, 7:01pm

For the demo I used a https://sonatino.com. I have used the Realtime API Embedded SDK on multiple ESP32-S3 devices. You might even be able to use it in other ESP32s, performance wise I had a decent amount of overhead left.

info502 · December 17, 2024, 7:01pm

Where can we access the code?

andrewpeng · December 17, 2024, 7:02pm

We don’t have a current roadmap for response pre-filling, but we will keep this in mind! For the DPO datasets, this can be obtained through human annotation, or some kind of A/B testing flow. For synthetic data generation, you can also explore some kind of rejection sampling with a model and an evaluation to help generate preferred and non-preferred outputs from the same prompt.

Topic		Replies	Views
All the questions addressed by the API team during the December 17, 2024 AMA Community community , ama , shipmas	3	1154	December 17, 2024
Launching o3-mini in the API Announcements	61	22483	February 10, 2025
Announcing GPT-4o in the API! Announcements	130	107875	July 4, 2024
Introducing ChatGPT and Whisper APIs Announcements whisper	77	19693	December 13, 2023
New models and developer products announced at DevDay Announcements announcement	70	17368	February 16, 2024

AMA on the 17th of December with OpenAI's API Team: Post Your Questions Here

Related topics