When will vision API become available?

Robiemaan · September 28, 2023, 4:33pm

Is there any timeline on when the API will become available for uploading images and having a conversation about them?

N2U · September 28, 2023, 4:36pm

There’s no official timeline, but the release statements said it would come to developers “soon after”

So I’d say somewhere between 2weeks and ~1 month maybe

anon22939549 · September 28, 2023, 5:40pm

I would be a little more conservative.

1–3 months, almost certainly by the end of the year.

It’s also something I could see being announced at Dev Day.

kabischmid · October 3, 2023, 8:49pm

Whats included? Not worth building then

codie · October 3, 2023, 10:04pm

I’d like to know as well. I imagine they are not doing anything sophisticated on the backend. Probably just a vision classifiers/describer that is injecting the results into an LLM and then spitting out text based on some instructions. I think a lot of people overestimate the craziness of what OpenAI is doing in the backend.

The magic of what they have built is the LLM, pretty much all of the other stuff is well known and done better by someone else. Depending on the cost and need, it might be worth building it in house. Wouldn’t be that difficult. Both Amazon and Microsoft have visual APIs you can bootstrap a project with. Probably get it done way faster than the OpenAI team. I know I only took about 4 days to integrate a local whisper instance with the Chat completions to get a voice agent. I suspect visual inspection and format detection would be easy enough to integrate.

anon22939549 · October 3, 2023, 11:45pm

If I am remembering correctly, the LLM was trained on image data as well, so I think it’s a bit more sophisticated than upcycling some CLIP output.

I’ve been experimenting more with Bing Chat and Bard image uploads in anticipation of GPT-4V dropping soon and they’re starting to get good, but there’s still a lot of room for improvement.

georg-san · October 4, 2023, 1:20am

I find this consistent developer-second approach concerning tbh. I understand why OpenAI pushes it’s own products first but these delays and limits on the API vs their own product does make me wonder how big of a priority developers are for OpenAI.

supershaneski · October 4, 2023, 2:58am

I am trying to curb my expectations but I wish that it will not only describe what is the image

input: an image of a fruit
output:
{
banana: 0.76,
apple: 0.23,
orange: 0.15,
grapes: 0.08
}

but it can also answer my questions regarding the image

input: some random image

query: what is the purple object in the image?
output: barney the dinosaur

query: how many purple objects in the image?
output: 3

query: what is the location of the purple object in the image?
output: { top: 50, right: 327, bottom: 125, left: 85 }

query: how far is the purple object from the camera?
output: 5m

_j · October 4, 2023, 3:12am

Are you saying that GPT-4V can’t do those?

gunnew11 · October 4, 2023, 4:39am

ChatGPT-4V can, but GPT API still does not offer image-based function even if it is of GPT-4.

Topic		Replies	Views
GPT4-Vision: Will there be API access? API	5	6139	November 7, 2023
Image inputs in the GPT-4 API API gpt-4	13	25347	February 6, 2024
Any update on GPT-4 vision? API	6	3233	December 17, 2023
ChatGPT can do Q&A on images, but did not find this feature in API API	2	1576	January 31, 2024
Open AI Vision API - when is it releasing? API	5	1251	November 3, 2023

When will vision API become available?

Related topics