I just received the latest E-mail from “The Batch” newsletter published by DeepLearning AI. It had this text in it:
“Upgrades and more: The company rolled out the upgraded GPT-4 Turbo (which now underpins ChatGPT). It extended API access to its DALL·E 3 image generator, text-to-speech engine, speech recognition, and agent-style capabilities. And it showed off a new concept in chatbots called GPTs.”
Does that mean there’s a route to us devs getting API access to Dalle-3 now? If so please leave a doc link. I’m eagerly awaiting such access!
Thanks! This is great news. Can you outline roughly any changes you made to your interface code over what you had working with Dalle-2? Any new or changed parameters? I have a working virtual world app with an in-world chat window to display generated images in the world. I’m wondering how many hoops I’ll have to jump through to get my current code to work with the new API.
Ouch. No image variations yet. Hope they roll that out soon:
" The only API endpoint available for use with DALL·E-3 right now is Generations (/v1/images/generations). We don’t support variations or inpainting yet, though the Edits and Variations endpoints are available for use with DALL·E-2."
There’s changes for sure. Edit and Variations aren’t supported for DALLE3 yet. You set the model to which one you want. I kept a DALLE2 version of the tools and spun-off a new DALLE3 version as there’s no small sizes in DALLE3 and other variations. The cookbook is a good rundown and the docs are mostly up to date.
“I’m getting a bit better quality consistently tonight after changing DALLE2 prompts to DALLE3-friendly prompts. Good stuff, though.”
Are you referring to the automatic GPT-4 assisted prompt rewriting feature you currently can’t turn off with the Dalle-3 API (as stated in that doc you linked me), or some manual process you are employing? If it’s a manual process, are you simply using intuition to do this or are there docs/links that give tips on writing Dalle-3 friendly prompts?
… some of my “styles” always add text no matter what, but I’m slowly getting the model to do it less.
If OpenAI really wants us to have as much control as possible (which makes sense), I’m sure it will improve - maybe even better seed handling. They want the tool to be as useful as it can be as much as we do, I think, but they’re erring on the side of caution when it comes to safety. (Remember Tay…)
ETA: What I’ve noticed so far is that natural language works a lot better than “prompt whispering” with “secret words and phrases” etc - just tell it what you want as detailed as possible…
Yeah, it’s why I’ve broken it down to just dropdowns and click a button. Choose what style you want and the character, and get it.
No comment but GPUs are the new gold bullion!
Seriously, though, OpenAI employees pop-in now and again. I’d rather they stay mostly busy with improving tools and coming out with new stuff, but it is appreciated when they stop by to enlighten us officially.
If you run into any specific problems, come back to let us know, but it’s fairly straightforward.
Just a follow-up on the tweaking thing. Yes, I have mixed feelings on this. For example, the Leonardo and Stable Diffusion APIs have a legion of parameters for style, engine select, on and on for their API calls. I swear there’s a market to create a chat-bot just to help devs use them!
The API “parameter swamp” appeals to me as a dev because I believe most of us that are programmers love parameter tweaking. But that’s out the window for the average user. Hell, even I got really tired after a while, especially since the WYSIWYG or “do what I mean” ratio on these features with nearly all the AI gen services I’ve used, especially the text-to-video ones is really low. It feels more like alchemy or voodoo than science.
BTW, did that OpenAI staffer talk at all about any plans for text-to-video in the near future?
I haven’t heard anything. I helped beta test DALLE2.experimental (which eventually became DALLE3 after our feedback this summer)… They’re likely working on stabilizing post-launch and adding to Labs, hopefully making edit and variations available for DALLE3, etc. I got to Alpha test the ChatGPT Plus launch which was kind of wild as we played around with it for a weekend…
Oh ratfink! You can only do one image at a time with Dalle-3!:
You can request 1 image at a time with DALL·E 3 (request more by making parallel requests) or up to 10 images at a time using DALL·E 2 with the [n parameter]