Will we (and when) have o1 in Batch Mode?
Not yet! The model has been trained for this capability (you can play with it in ChatGPT now!). Stay tuned for it coming to API next year
I’ve got an application we’ve developed on the Assistants API, primarily because of the ease of use on the knowledge base and uploading files. Would it be possible to replicate the functionality of that with the new function calling feature, allowing for use of the newest features like fine tuning and realtime voice with this endpoint, while still keeping the ability to do RAG for context?
It’s definitely worth retrying! Both GPT-4o and 4o mini have improved meaningfully in multi-lingual understanding with the latest snapshots. We still use the same Whisper model to transcribe what the user said, but then GPT-4o processes the audio directly and responds (without going through a transcription). Would love to hear what you find?
When can we expect the voices from advanced mode will be made available to the API. Right now they are two different voice. And when will you make available tools to control the voice emotion, tone, innotation, etc?
Is Realtime API out of BETA now?
Just FYI the python/JS code blocks in the docs here are reversed
Is there a plan to have fine tuning for the audio models? And if so, when can we expect it?
Which tier are you on? o1 has started rolling out to developers on usage tier 5 today, and rollout will complete after a few weeks. Team’s working on adding expanding to more tiers. (You’ll get an email notifying you once it’s available to you in the API and Playground.)
- It’s a cool protocol!
- For DALL-E, thanks for the input!
- I hear you on the vision improvements. We’re working on something in this space now and hope to have some improved vision models soon.
Thank you, valuable info for me and my org!
The keys are locked to a specific Realtime API session, so their use is substantially limited compared to typical OpenAI API keys.
Not sure if this answers your use case, but I’ve built out automations that use multiple separate assistants on a single chat thread. Each automation module is retaining the thread ID, and using a different assistant ID with a prompt to generate a more specific request, with conversation history maintained in the thread.
When will DALL·E’s seed functionality be available via the API?
This feature has been available in ChatGPT for over a year and is essential for developers. DALL·E’s prompt adherence and quality is unmatched, and its seed functionality delivers more consistent results than competitive image generation models. Releasing it via the API would unlock incredible potential for the broader developer community. Please consider prioritizing this!
I wanted to do a YouTube 2024 rewind video
Preference fine-tuning can encourage longer responses, however we still have a maximum number of output content tokens for each model.
Great question on the developer
message! The best resource to read more is our Model Spec detailing the hierarchy:
Follow the chain of command. Subject to its rules, the Model Spec explicitly delegates all remaining power to the developer (for API use cases) and end user. In some cases, the user and developer will provide conflicting instructions; in such cases, the developer message should take precedence.
Thank you for the note on PHP! With the addition of Java and Go today, our goal is to ultimately support the most popular languages and tech stacks. No timeline to share yet for a PHP SDK, but in the meantime, we recommend community-supported libraries to get started.