AMA on the 17th of December with OpenAI's API Team: Post Your Questions Here

We’ve heard a lot of demand for search and browse capabilities in the API – especially from folks who want to build applications that are grounded in super recent information. We’re actively working on this and expect to launch something next year!

11 Likes

When will we be able to fine-tune voice models?
Is this something in the works or on the roadmap?

2 Likes

Hey OAI great release today!
For O1, is there a roadmap to achieve in-COT function calling to supplement the reasoning context? Is this more of an Assistant update or is this the super secret Agents project?

We are always working on decreasing the prices of our models – so I would expect more here over time as well! I would also recommend trying the new 4o-mini realtime model we launched today, it’s quite a bit cheaper too. :slightly_smiling_face:

5 Likes

Thanks for the GO and Java SDK’s. Any updates for Official Open AI PHP SDK?

1 Like

Yes, you should be confident building with client.beta.chat.completions.parse for Pydantic Structured Outputs!

Also yes, we are working on bringing structured outputs (and also other useful API features) to all reasoning models, including the o-mini series. Stay tuned.

7 Likes

If you’re referring to the “generate” feature that auto-generates a json schema for you – this is just a simple call to our chat completions API with a specific prompt! You can actually see the prompt we use in our documentation: https://platform.openai.com/docs/guides/prompt-generation#meta-schemas

6 Likes

Congrats! that’s amazing, I was super happy with the chart that was shown today about o1’s structured outputs performance!!


oh, and thank you so much for the reply!

Today, the best way is with a tool call that you use to trigger o1 (probably using the new out of band conversation feature). We’ll keep investing in making it easier to use more intelligence within the Realtime API

Re video. Stay tuned it’ll come next year. The model has been trained for this capability so in the mean time, you can experiment with it in ChatGPT

5 Likes

There is any chance to get the audio I/O capability in assistant api?
Will be very useful to get the possibility to send audio to specific assistant and get audio+text as output. Realtime capability will be great too, ma the basic multimodal will simplify a lot.

Any new tools like Neuron Viewer?

Any chances Neuron Viewer can be updated with a newer model?

2 Likes

Why realtime api sounds so horrible in other languages than english against the audio to audio advanced voice mode in chatgpt app?

Nothing to share just yet. We focused the first pass of prompt caching on making it as easy as possible to use (no API changes needed, no extra cost for cache writes). It’s a feature we care a lot about though … would be curious how you’d want to use a more structured cache?

3 Likes

The demo itself used a https://sonatino.com. You could use any ESP32-S3 though!

It was connected to Sennheiser Enterprise Solution SP 20 ML that I got off eBay. You can just use any headphones though.

The stuffed toy is Record Your Own Plush 16 inch Reindeer - Ready 2 Love in a Few Easy Steps - Walmart.com

10 Likes

We made a bunch of improvements to audio truncations with the new models… would be curious if you still see problems there?

It’s still possible to get the model stuck in text only mode if you give it a huge amount of text upfront. Known issue that we’ll keep improving in future releases. In the mean time, putting audio in the latest user turns can help the model rediscover its voice

6 Likes

Does Prompt-Engineering techniques like using delimiters/json as input or MD as input add to the inference quality over API?

Thanks for your response, it is much appreciated!

I’ll give a try to the new mini with audio capabilities.

Do you guys have benchmarks showing how good it is for function calling? Our product is based around that so if the mini is less accurate, we’d have to stick to the full model for now.

Thanks again :heart:

1 Like

For the demo I used a https://sonatino.com. I have used the Realtime API Embedded SDK on multiple ESP32-S3 devices. You might even be able to use it in other ESP32s, performance wise I had a decent amount of overhead left.

5 Likes

Where can we access the code?

We don’t have a current roadmap for response pre-filling, but we will keep this in mind! For the DPO datasets, this can be obtained through human annotation, or some kind of A/B testing flow. For synthetic data generation, you can also explore some kind of rejection sampling with a model and an evaluation to help generate preferred and non-preferred outputs from the same prompt.

3 Likes