Gpt-4-vision-preview model for other document types not just images

hain2005 · January 12, 2024, 3:00pm

Hello,
I and using gpt-4-vision-preview model successfully passing in url images in ‘image_url’ tag, but wondering when I can upload in other documents in the chat conversation like plain text, CSV, MS Word or Excel?
Thanks
Hai

trenton.dambrowitz · January 12, 2024, 3:05pm

That sounds like a job better suited to the Assistants API, which unfortunately doesn’t support the vision model at this time.

Assistants tools - OpenAI API

cyzgab · January 12, 2024, 3:06pm

What’s the use case you have in mind?

hain2005 · January 12, 2024, 3:27pm

We want to get summaries of the upload documents, key phrases, sentiments, etc. In addition, we would like chat suggestions in rewriting certain part of the documents.

hain2005 · January 12, 2024, 3:30pm

We use Azure OpenAi and can do all this but had to reply on other Azure services like Vision and Language. We are at the mercy of the SDK library @azure/openai current at version 1.0.0-beta.7.

cyzgab · January 13, 2024, 3:16pm

Why not read the file (or process via OCR) in run time to extract the text in the file & send it to the model.

The GPT vision model, perhaps rightfully, takes an image as one type of input.

Anyway, based on how the apis are designed - more multimodality is definitely on the horizon

hain2005 · January 13, 2024, 8:51pm

Yes, that’s what I did on all other documents type. I actually use Azure AI Language to give me a summary, and then feed the summary into ChatCompletion to get it in the chat conversion for any further prompt. I want to reduce the token usage. It does the job but would entertain a different way if it’s better. Thxs

Topic		Replies	Views
How can I upload documents API chatgpt	1	161	January 13, 2025
Document Ingestion EndPoint API gpt-4	4	594	April 13, 2024
Programatically reproduce gpt-4o file upload API gpt-4o	5	423	December 19, 2024
Make OpenAI Vision API Match GPT4 Vision API chatgpt	4	3637	December 6, 2023
Send files to completion api API gpt-4 , chat-completion , pdf	8	28920	June 27, 2024

Gpt-4-vision-preview model for other document types not just images

Related topics