Miss information with Vision and Image Descriptions

I hope someone could give some clarity on this please as I’m struggling to get a definative resolution.

From what I understand you CANNOT upload an image to Dall E with a description of how you would like to change the image.

So the work around is to upload the ‘refernce’ image to OpenAI Vision API this will return a comprehensive text description or ‘prompt’ which we can then send to the OpenAI Image generator to return the new generated image.

There is just some much misimformation Dall - E 2 did work and Dall E 3 does not. Vision is or isn’t included in the subscription, I am confused dot com :frowning:

dall-e-3 on the API requires its own separate set of parameters and has limitations on others. Here is my write up from November 2023, also including sample code with commented parameters that one could employ for using “3”, along with inline comments in other parameters with special notes.

To start with the example code, besides installing the modules required for image handling, one can just uncomment the first line that switches the model name.

(despite other changes to OpenAI SDK, this code still runs fine to download and show you an image - and a DALL-E 3 oil painting instruction is looking better than those returned originally.

The input to DALL-E is only plain text. You can of course have another AI with vision capability look at an image, and give your AI instructions of how it should reproduce a detailed description so that a forensic artist could replicate the original.

I hope that gives you clarity of API use, and you can thus can provide DALL-E to a language AI as a tool that AI can use by sending the images API endpoint a prompt and specifications.

Thank you so much for the reply, yes I think this is the tricky bit! do you have any advice tried DeepAI and Google Vision, GV is really not upto the task :wink: