GPT-4-Vision Interesting Uses and Examples Thread (2023)

What fun things have you been doing with GPT-4-Vision?

Can be via API or ChatGPT …

I hope to see a lot of great examples…

Here’s one of mine.

Asking GPT-4-Vision to identify #DND version from ToC… it got kinda close! Can you do better as a human?

1 Like

This was a fun thread…

Do you have a vision puzzle you need solved?

I want my home to be paperless. I already have a document scanner which names the files depending on the contents but it is pretty hopeless.
So I am writing a .Net app using gpt-4-vision-preview that can look through all the files that the scanner dumps into a folder, and name them based on the contents & also file them in the correct directory on my PC based on the contents.

Problems so far:
API kept rejecting the image with “I’m sorry, but I cannot assist with requests that involve processing images that may contain sensitive personal data such as credit card information”. I think I have worked around that by getting it to do a JSON output (even though I am don’t have the

.response_format = {“type”: “json_object” }

parameter in my code - i just tell it I want JSON out & it provides something resembling JSON and no longer refuses the task - I can clean up the JSON in code later.

Now I am categorising the document into various classes - credit card receipt, cash receipt, delivery order, etc. for the code to later move the file to the correct directory. It also extracts the transaction date, amount including currency as I travel a lot, establishment, item, card number. Unfortunately it seems I have to get the whole document in JSON format so it won’t refuse for security options, rather than just ask it to retrieve a certain subset of information in a small csv or json object to keep costs down

The item is also to be decided by AI - if it recognises a receipt for a meal it will determine breakfast, brunch, lunch, etc. by the timestamp on the receipt. Alternatively it might decide the receipt is for groceries or snacks, etc. Or the document is something completely different - quick user guide for some electronics item or a delivery order or a bill from my phone company.

The aim is to just scan my documents for the month into a folder & have the AI do the rest - probably 100 documents per month - and hoping to do it for 1 cent or less per document - at the moment I am looking at 2 or 3 cents to process each one.

2 Likes

Welcome to the forum.

Very interesting use case. Good luck with the problems. I’ve not worked with vision API much myself yet, but hopefully smarter people will chime in here shortly.

Hope you stick around. We’ve got a great community growing.

Do you have working code you could share that included this model and this response format? Because it doesn’t seem like a valid option for the vision-preview (at least not currently for me).

1 Like

This is a full working example for Python for asking about an input image with the new vision-preview. It’s still hard to find simple examples so I thought I’d share:

import base64
import openai
import os


def main():
    # Updated file path to a JPEG image
    image_path = "/Users/Documents/mouse_picture.jpg"

    # Read and encode the image in base64
    with open(image_path, "rb") as image_file:
        encoded_image = base64.b64encode(image_file.read()).decode("utf-8")

    # Craft the prompt for GPT
    prompt_messages = [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Here is an image, is there a mouse in the image?"
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{encoded_image}"
                    }
                }
            ]
        }
    ]

    # Send a request to GPT
    params = {
        "model": "gpt-4-vision-preview",
        "messages": prompt_messages,
        "api_key": os.environ["GPT_API_KEY"],
        # "response_format": {"type": "json_object"},  # Added response format
        "headers": {"Openai-Version": "2020-11-07"},
        "max_tokens": 4096,
    }

    result = openai.ChatCompletion.create(**params)
    print(result.choices[0].message.content)


if __name__ == "__main__":
    main()
1 Like

I uploaded some YouTube data and was asking about a new headline idea based on existing data… then I remembered I could also “show it” the thumbnail… It read “game of thrones” in my “style” list on the left and thought it prominent - maybe because the show is popular?

Still… could be a useful tool… especially if it knew what to look for and point out with thumbnail images for YouTube… The code interpreter was useful for the CSV data too…

Just spoke to a discord member who has been using it to fix his PC as he has visual impairment! How awesome!

3 Likes

That’s super cool. What a great idea.

Hmmm… Now I kind of what to bust out some of my breadboard and gadgets and see how well GPT-4V can help me out

1 Like

1 Like