Calculating the pricing of GPT4V

RiavvioAS · November 28, 2023, 10:37am

Hello there!

I’ve been planning a possible use of the API for GPT4V: a prospecting client would like to describe some technical drawings into text. They would then store the description and retrieve them using Natural language and a vector DB.

The problem is estimating the cost of the operation: the drawings are pretty large but when i try to estimate the cost, the widget says that the image resolution has been resized (see image)

is it normal ?
could it be detrimental to description phase? A technical drawing might have several annotations which are important for the general understanding.

Thanks in advance!

Foxalabs · November 28, 2023, 11:15am

The current image processing library is not suitable for large amounts of technical detail, partially due to the potential resizing issues and partly down to the model still being a trail, if you are expecting to send a vert detailed high resolution image containing lots of text and graphical items to be detected accurately, you will potentially have have issues.

Fusseldieb · November 28, 2023, 11:20am

Exactly. Your image will get resized and tiled.

Maybe you can somehow overcome this problem by splitting the image up into interlacing 768x768 tiles, sending them in the same request (GPT4V accepts that) and then describing them, but YMMV.

RiavvioAS · November 28, 2023, 12:16pm

Thanks for the the feedback, we will have to try some images
I’ll make some tests and post a detailed feedback

RiavvioAS · November 28, 2023, 2:28pm

Hello, thanks for the feedback!

I have no further info about the prospecting clients, i only know that they produce hydraulic pumps and not much more.

i suppose their idea is to improve the retrievability of the drawings: if they are able to obtain a description of the drawing, influenced by the data read by vision, they turn the description into embeddings, load them into a vector DB and search them using natural language.

That would be cool, but i’m not sure it’s their main concern

I’ll know more when i have some of those drawings.

RiavvioAS · December 11, 2023, 2:28pm

Brief update on the original question.

it turned out that the client is more interested into parsing very long technical documents.
the need to “parse” the drawings is not needed. So somebody else will have to do a proper test.
I’ll have to evaluate a RAG method with very long documents (dozen of pages).

Topic		Replies	Views
I am paying +$1 for a single request on analysing a 200kb image API gpt-4	5	560	June 1, 2024
Using the Vision API: best practices API api , gpt-4-vision	10	1884	September 26, 2024
OCR using API for text extraction API api	9	11728	December 18, 2024
Better Understand Images / Train On Annotated Images API gpt-4 , api	22	1670	April 2, 2024
Using gpt-4o, what size are large images resized to API gpt-4	3	2997	November 5, 2024

Calculating the pricing of GPT4V

Related topics