I added a description also when there is an “enum” in there.
The JSON I made:
{"tools": [{
"type": "function",
"function": {
"name": "createImage",
"description": "Create an image using DALL.E",
"parameters": {
"type": "object",
"properties": {
"prompt": {"type": "string", "description": "A text description of the desired image. The maximum length is 4000 characters"},
"size": {"type": "string", "enum": ["1024x1024", "1792x1024", "1024x1792"], "description": "The size of the generated images"},
"style": {"type": "string", "enum": ["vivid", "natural"], "description": "The style of the generated images. Vivid causes the model to lean towards generating hyper-real and dramatic images. Natural causes the model to produce more natural, less hyper-real looking images."}
},
"required": ["prompt"]
}
}
}
So is my JSON right or does it need fixing or improvements?
DALL-E 3 on API has its own AI that rewrites prompts. The more this AI writes, the longer it takes to get back an image.
The real maximum length that can be passed into DALL-E 3 after its own AI pre-filter is 256 tokens.
So don’t waste time and expense having your chatbot AI write a novel as a prompt one token at a time. Instruct your AI and the DALL-E 3 AI (perhaps by your own backend injection of more “jailbreak” prompt text within the function handling) to prefer passing the user input without any alterations if it conforms.
You can write a much more massive multi-line function description than just “create an image”.
You should inform the AI function of the internal content prohibitions on real people, real recent artists and styles, etc. and to shape the language sent correctly and intelligently. The API rewriter will significantly distort the meaning when it rewrites over these prohibitions, and yet the final “content policy” of DALL-E will give you a costly error if something like “Mickey Mouse” goes through and blocks the image generation.
You should inform the AI that negation, trying to discourage imagery elements by writing about them, simply does not work.
You should inform the AI of the user language, style and content, or instructions that would trigger selection of square, wide, tall images.
Tall images especially need more prompt language about “tall portrait-aspect ratio full-body-length” passed in order for DALL-E to not produce a rotated image. You can do this also by injection after receiving the tool call.
“Natural” now produces poor results from a significantly different model method: people that look like they were Photoshopped in.
The tool spec itself looks fine. The list of tools needs to be closed with ]
Idea when you’re at the advanced level: another tool property “send_unaltered: boolean” - that lets the user or AI decide that the DALL-E 3 isn’t allowed that rewriting it does.
Also, add a DALL-E 2 function, taking just a prompt. v2 is something ChatGPT Plus doesn’t have. Describe to the AI the function is useful for abstract artistic images like paintings (and the user gets 1024px) - and it costs less.
Much of that is my own knowledge from solving other’s problems.
This forum has its own search, where you get useful responses members have written instead of Google results full of bait videos. You can add “@_j” to the search terms to get what I’ve said before. It even has an AI.
Really pore over what they write. For example, they offer a prompt to get less AI rewriting by DALL-E 3, but it is not as effective as wrapping your prompt in a subterfuge of lies to get absolutely no alterations in what you sent…
Other suggestions like actual token length, after which input is discarded, are from a Discord chat with DALL-E developers, others from probing ChatGPT into revealing how it uses a DALL-E tool, others, simply by employing the black box of image generation and seeing what it will produce for you. (As far as having its rewritten prompt be the AI dumping out its own programming )
OpenAI doesn’t even document “hey, we programmed our AI to output markdown formatting at you”. Other non-novel API mechanisms are often regarded as secrets, are documented by misdirection and curtailed by blocking. They don’t tell you how to make a chatbot that exceeds ChatGPT, either…
advanced tricks, for me to know…
dalle-e determinism
Hey, how can you still produce images? Is your end alright and functional? My experience with the chat and API were abysmal. It almost always refused my prompts
Prompts that “trigger” content policy keywords that the authoring AI is not aware of, and make sense only when viewed by OpenAI’s motivations, will be denied. Such as trademark violations, political imagery, real persons, copying copyrighted works, etc. Understand how it works, don’t try to skew the narrative by dropping “Israel” or “Hamas” in the prompt, contemplate how you’d write a word filter yourself under similar publicity pressures, and you’ll have better success.