DALL-E API to generate json data from image

yannick.lansink · October 12, 2023, 7:54am

I’m working on a project that, at some point, needs to extract metadata from a user-supplied image. After reviewing the DALL-E API documentation, it seems like my goal might not be achievable. However, this is kind of odd, because ChatGPT allows for image uploads and provides context for the imag.

Am I looking in the wrong place? I believe this feature should be available in the API, just as it is in ChatGPT itself. Can anyone clarify this for me?

supershaneski · October 12, 2023, 8:18am

I think what you need is GPT-4 Vision. But there is no API yet.

yannick.lansink · October 12, 2023, 8:37am

Is that not weird? Because chat interface does let your upload an image and get specific data about the image back. Do you know any alternatives to my use case?

supershaneski · October 12, 2023, 8:47am

Check this post about the metadata contained on the response returned by GPT-4 Vision for your reference.

But since there is no API yet, we cannot know for sure exactly.

yannick.lansink · October 12, 2023, 2:51pm

I have been diving into this problem all week and now come across this post on youtube from a reasoned talk from openai engineers. (2 days ago)

Step 1:
Getting a description from an image (this is the exact problem I’m facing. I don’t know how they do this)

Step 2:
Ask gpt4 to generate a description for dall-e to generate a new image for this in a certain style.

Step 3:
Compare both images. Generate a new prompt to check the difference in both images.

Step 4:
Use that newly created prompt and great the final image.

What I want to know

How are they doing step 1? I’m struggling with finding out how to get a relevant description from an image I put in.

I have added the timestamp from where it begins.
https://www.youtube.com/live/veShHxQYPzo?si=4msqcMAvwYzKOOKL&t=4775

Foxalabs · October 12, 2023, 3:33pm

The functionality to do step 1 via the API is not yet released, it will be released but there are no official timescales for that yet, you can do it via ChatGPT Plus with image input, so it will be that feature that gets hooked up to the API.

yannick.lansink · October 12, 2023, 6:12pm

Thanks for clarifying, I came to the same conclusion just a bit later.

It that normal that features drop later for developers? I feel like it should be the other way around, but that is probably just me.

N2U · October 12, 2023, 6:27pm

I wrote a script that does step 1 a while back (and prints the descriptions to a CSV). If you’re interested, you can find it here:

_j · October 12, 2023, 10:15pm

I’ll put that BLIP on my RADAR (Realtime Augmentation Describing Artistic Renderings)

N2U · October 12, 2023, 11:11pm

I should probably mention that it tends to hallucinate… a lot

If you’re interested in something that can do bit better, I’ll recommend this:

It’s a bit more memory-intensive, but there’s a Colab notebook available if you just want to try it out.

yannick.lansink · October 14, 2023, 7:57am

Thank you for your suggestion. I will dive into the github repo you send me! Just one thing, I’m somewhat new to this world and want to try to implement this in my project. Where should I start with learning this? Are there tutorials out there that put me up to speed?

N2U · October 14, 2023, 9:39am

Always happy to help!

My best advice on how to get started is, just do it, use git clone [URL] to clone the repo, get it up and running, and finally import the relevant bits into your project

You can ask ChatGPT to explain any errors you run into along the way; it is usually pretty good at that.

Topic		Replies	Views
Miss information with Vision and Image Descriptions API api	2	87	September 23, 2024
Use own data with images for queries API chatgpt , api	9	5948	June 7, 2024
ChatGPT goes Multimodal! Sound and vision is rolling out on ChatGPT Community chatgpt , multimodal	34	13403	December 10, 2023
Make OpenAI Vision API Match GPT4 Vision API chatgpt	4	3828	December 6, 2023
Is this a right JSON structure of this function? API api	5	391	April 11, 2024

DALL-E API to generate json data from image

What I want to know

Related topics