I and using gpt-4-vision-preview model successfully passing in url images in ‘image_url’ tag, but wondering when I can upload in other documents in the chat conversation like plain text, CSV, MS Word or Excel?
That sounds like a job better suited to the Assistants API, which unfortunately doesn’t support the vision model at this time.
What’s the use case you have in mind?
We want to get summaries of the upload documents, key phrases, sentiments, etc. In addition, we would like chat suggestions in rewriting certain part of the documents.
We use Azure OpenAi and can do all this but had to reply on other Azure services like Vision and Language. We are at the mercy of the SDK library @azure/openai current at version 1.0.0-beta.7.
Why not read the file (or process via OCR) in run time to extract the text in the file & send it to the model.
The GPT vision model, perhaps rightfully, takes an image as one type of input.
Anyway, based on how the apis are designed - more multimodality is definitely on the horizon
Yes, that’s what I did on all other documents type. I actually use Azure AI Language to give me a summary, and then feed the summary into ChatCompletion to get it in the chat conversion for any further prompt. I want to reduce the token usage. It does the job but would entertain a different way if it’s better. Thxs