Integrate both gpt-4 and gpt-4 vision in same chat

benjamin.bascary · January 24, 2024, 3:27pm

Hi there!

Im currently developing a simple UI chatbot using nextjs and openai library for javascript and the next problem came:

Currently I have two endpoints: one for normal chat where I pass the model as a parameter (in this case “gpt-4”) and in the other endpoint I pass the gpt-4-vision. So I have two separate EPs to handle images and text.

Is any way to handle both functionalities in just one chat session (like chagpt does right now). The documentation is not clear or gives examples on how to integrate both funcionalities in one chat. Should we upload the file separately and then send it as a message inside the context (image URL, reference?).

like:

{
“role”: “user”,
“content”: Message: ${message}? ImageUrl: {image URL after uploading to openai server}
}

Some help here please? Someone got the same problem before?
Any ideas are welcome.

Cheers!

Diet · January 24, 2024, 3:36pm

Hi! Welcome to the forum!

You can have a conversation thread in vision just like with the chat model:

data = {
    "model": "gpt-4-vision-preview",
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "assistant",
            "content": "Hello! How can I help you today?"
        },
        {
          "role": "user",
          "content": [
            {"type": "text", "text": "What’s in this image?"},
            {
              "type": "image_url",
              "image_url": {
                "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
              },
            },
          ],
        }
    ]
}

is that what you mean?

if the user uploads an image, or if there are images in the thread you can just switch to vision, otherwise you can stay with turbo to save costs or RPDs.

benjamin.bascary · January 24, 2024, 4:22pm

Hey! Thank you for your fast response!

Im thinking about the pricing here. Yes, I know I could chat normally with the vision model, but that would be costly right? Comparing to “normal gpt-4”. I don’t know how openai handles this, maybe I can iterate over each message and find if there is a image type message in the context? And if not, switch to normal model

Diet · January 24, 2024, 5:48pm

actually vision costs the same as turbo: Pricing

The only problem I would see is the rate limit, but it depends on your use-case, and your tier: https://platform.openai.com/docs/guides/rate-limits/usage-tiers

if the rate limit is a show stopper, you could add a function/tool to plain old gpt 4: if there are images in your thread, you just mask them. if the user is trying to reference an image, or if an image needs to be referenced for an answer, gpt 4 calls the function and on call, you just send the whole unmasked thing to vision.

that means that some vision calls might be almost twice as expensive in terms of context, but you might be able to optimize that maybe with some word filtering or other heuristics.

just an idea.

masfour700 · February 26, 2024, 10:22am

Hi @benjamin.bascary , were you able to figure out the pricing thing? For a follow-up question in the same chat session, will it transfer the image in the first message to tokens again?

Topic		Replies	Views
Computer vision models API API	3	320	November 21, 2024
Using the Vision API: best practices API api , gpt-4-vision	10	949	September 26, 2024
Gpt-4-vision-preview model for other document types not just images API	6	1643	January 13, 2024
Image analysis takes too long for lot of promts API gpt-4 , api	10	458	June 26, 2024
Make OpenAI Vision API Match GPT4 Vision API chatgpt	4	3424	December 6, 2023

Integrate both gpt-4 and gpt-4 vision in same chat

Related topics