API playground - image generation

I am using the playground to test features. I can easily use the chat or assistant part, but when I ask for image generation, I get the answer that it can not generate images. (I tried different models including ChatGPT-4o)
So I understand that , if I need to create an image using the API I have to use the “generations” endpoint.

How can I test image generation using this endpoint on the playground ?

Also, If I am going to write an app which mainly uses asistants endpoint and run a conversation, how can I do the transition from assistants endpoint to generations endpoint on user’s prompt ?
I mean , when throughout the conversation, user asks for image generation, by default my app will go to the assistants api (runs endpoint after creating a message with the messages endpoint)
How can I understand that the prompt is about image generation ?

is there any idea on this ?

The only way to create an image is to send it to DALL-E 2 or DALL-E 3 for creation - and then pay for each image, $0.02 up to $0.12 each.

You can provide a tool to the AI for it to call, but that just makes two rounds of AI between the input and the generation. It is better to make an image-specific interface and then even give some help in the image creation process that becomes part of the prompt.

The creation is just as simple as the API reference shows, adding your own parameters from those available:

from openai import OpenAI
client = OpenAI()

imagedata = client.images.generate(
  model="dall-e-3",
  prompt="A cute baby sea otter",
)
print(imagedata.model_dump())

The default response is a pydantic object that has a URL where you can download for a short time, or you can also specify response_format to receive a base64 binary in the object instead.

Here’s a tool function that gives all the API parameters to an AI - it is up to the AI then to use it appropriately and spend your money wisely. You’ll likely want to write your own and then rewrite what is actually sent.

tools_list.extend([{
    "type": "function",
    "function": {
        "name": "create_image",
        "description": "Create an image using either DALL-E 2 or DALL-E 3. AI can call this tool to generate images based on a prompt, model, and specified options.",
        "parameters": {
            "type": "object",
            "properties": {
                "prompt": {
                    "type": "string",
                    "description": "A text description of the desired image(s). The maximum length is 1000 characters for DALL-E 2 and 4000 characters for DALL-E 3.",
                },
                "model": {
                    "type": "string",
                    "enum": ["dall-e-2", "dall-e-3"],
                    "default": "dall-e-2",
                    "description": "The model to use for image generation. Defaults to DALL-E 2."
                },
                "n": {
                    "type": ["integer", "null"],
                    "default": 1,
                    "description": "The number of images to generate. Must be between 1 and 10. For DALL-E 3, only n=1 is supported."
                },
                "quality": {
                    "type": "string",
                    "enum": ["standard", "hd"],
                    "default": "standard",
                    "description": "The quality of the image that will be generated. 'hd' creates images with finer details and greater consistency across the image. This parameter is only supported for DALL-E 3."
                },
                "response_format": {
                    "type": ["string", "null"],
                    "enum": ["url", "b64_json"],
                    "default": "url",
                    "description": "The format in which the generated images are returned. Must be one of 'url' or 'b64_json'. URLs are only valid for 60 minutes after the image has been generated."
                },
                "size": {
                    "type": ["string", "null"],
                    "description": "The size of the generated images. Must be one of '256x256', '512x512', or '1024x1024' for DALL-E 2. Must be one of '1024x1024', '1792x1024', or '1024x1792' for DALL-E 3 models."
                },
                "style": {
                    "type": ["string", "null"],
                    "enum": ["vivid", "natural"],
                    "default": "vivid",
                    "description": "The style of the generated images. Must be one of 'vivid' or 'natural'. 'Vivid' causes the model to lean towards generating hyper-real and dramatic images. 'Natural' causes the model to produce more natural, less hyper-real looking images. This parameter is only supported for DALL-E 3."
                }
            },
            "required": ["prompt"]
        }
1 Like

Thanks. I understand that the json you provided above is for creating a tool on the assistant. Should I use the assitants endpoint to crate this ?

And then, how do I call it ?

When the AI is provided a tool specification it can use, it will call for the tool when, from the name and description, the tool function seems it will fulfill a user need.

For example a chatbot that gives customer service may have your tool “product search” where your code uses your pricing database and your code to search and return results.

You do not need to use assistants. The chat completions endpoints also allows you to provide tools to the AI, and it will emit tool_calls with the function arguments that your code must fulfill instead of a response to the user.

Ok but from what part of this json will the endpoint understand that it has to call generations endpoint to fulfill the requirement ?

And is it ok if I post this to the assistants endpoint when I create the assistant with the parameter “tools” and type “function” ?

The AI cannot call “generations endpoint”. It can only call your code that makes and receives API calls. You are the one that implements the function of producing images, providing a UI to show or download them, and tell the AI the function executed successfully.

Giving tools to assistants is similar, but handling is quite different; you get a new message status that the thread requires action after polling, and then must return to a specific method.

API reference has details of how to use assistants. However, if you are providing an image generation service, I would not bother with putting a chatbot in front of it. You are not adding any value that ChatGPT Plus doesn’t already have.

Actually I am trying to develop some application with multiple added values. But it should also be able to provide what ChatGPT plus already has. I can’t tell my users, “go use ChatGPT plus when you need image generation”. Probably they don’t want to pay for ChatGPT plus when they already pay me.

So, this is necessary for my app.

So, I create the assistant with this function. And when the user asks for image generation, it will return something else than “thread.message.delta” or “thread.message.completed”
I have to catch it and then submit the prompt to “generations endpoint”

Is that correct ?

Where is this documented ?

https://platform.openai.com/docs/api-reference/assistants-streaming/events

Thank you. I am a little lost. The documentation is not clear enough.
When I test, the assistant endpoint returns with many “thread.run.step.delta” and a final “thread.run.requires_action”
I can extract the following “required_action” from the response:

"required_action":
{"type":"submit_tool_outputs","submit_tool_outputs":{"tool_calls":[{"id":"call_FfStEfEs38OtaJ6uxss4Eqj0","type":"function","function":{"name":"create_image","arguments":"{\"prompt\":\"A bright sun \",\"model\":\"dall-e-2\",\"size\":\"512x512\"}"}}]}}

the “run” is not completed. from this output I can get the “prompt” and post to dall-e-2
And in return , it will provide me an image (url)

But how do I continue the “run” ?

I get that I have to call the endpoint “https://platform.openai.com/docs/api-reference/runs/submitToolOutputs
But what will be the output sent ?
Just the url ?

Ok. I tried sending the URL and I can see that the image is sent to assistant and I can even see its response on the dashboard. But my stream listener does not receive the response from assistant.

You cannot send an image URL by tool for anything other than having the AI report on that value to the user.

Images for vision or for code interpreter must be attached to a thread user message, which cannot be modified during a run.

The “documentation” link on the forum sidebar takes just a click or two to then demonstrate Python SDK methods for returning a tool response when streaming and awaiting more streaming response:

    def submit_tool_outputs(self, tool_outputs, run_id):
      # Use the submit_tool_outputs_stream helper
      with client.beta.threads.runs.submit_tool_outputs_stream(
        thread_id=self.current_run.thread_id,
        run_id=self.current_run.id,
        tool_outputs=tool_outputs,
        event_handler=EventHandler(),
      ) as stream:
        for text in stream.text_deltas:
          print(text, end="", flush=True)
        print()

I am not using python, but Java.
Since there is no official Java sdk, I am trying to build up my own library.
So it is not as easy as what you have in Python.

The problem is that, I am making the post to the submit_tool_outputs endpoint and I am expecting to receive stream of events as in the messages endpoint. But I receive it as a direct response to the post call.

I think I found where the problem is.