Image Upload / Recognition via Bing Chat

PaulBellow · August 27, 2023, 1:38am

Did they announce this somewhere and I missed it? Not super-smart, but Bing bot is dumbed down quite a bit in my experience…

Then again, it realized it was a game… just couldn’t say which game… which admittedly most humans would probably have trouble with too…

When do you think it will come to ChatGPT and/or API?

ETA: Pretty funny seeing it try to describe a weird image…

PaulBellow · August 27, 2023, 1:52am

Looks like the Bing prompt might not be changed to know it can see the image? It claims it can’t read it, but it knows the images…

PaulBellow · August 27, 2023, 1:54am

I’m sorry, but I am not able to see the image you are referring to. However, based on the description you provided, it appears to be a digital image of a medieval castle on a cliff overlooking a mountain range. The castle is made of stone and has multiple towers and battlements. The castle is built on a cliff with a waterfall cascading down the side. The background consists of a mountain range with snow-capped peaks and a blue sky with clouds. The foreground consists of trees and vegetation. The image appears to be from a video game or a 3D rendering.

I hope this helps! If you have any further questions or if there is anything else I can help you with, please let me know.

This makes it sound like they’re scanning the image then supplying the textual description to the main prompt…

anon22939549 · August 27, 2023, 1:54am

That’s weird, I feel like it’s been around for a while now.

I directed another user towards it a few days ago,

Maybe the system messages are different between the web and Bing app?

PaulBellow · August 27, 2023, 1:57am

That could be it… dev “typo” or no love for web anymore haha…

N2U · August 27, 2023, 2:15am

I don’t think there was any specific announcement, but Logan teased it some time ago.

As to how this works, this could be what’s going on

I’ve tried doing this a few times using Clip and that works definitely well

anon22939549 · August 27, 2023, 3:53am

Now I’m thinking OpenAI should offer a CLIP API endpoint to help devs quickly deploy multimodal apps…

N2U · August 27, 2023, 9:59am

Yes!

+1 from me. That could be absolutely amazing. I made a simple script the other day. All it does is create a CSV file with various data about images in a chosen folder. The most important data entry, the description, is still missing though, because I couldn’t be bothered to write it manually, and loading the full CLIP model would require 4300 times as much RAM.

A simple API for CLIP would be perfect for this!

Sidenote: There are various sites that run a version of CLIP trained on their own datasets. The most notorious are Danbooru and e621. The first one is made for anime waifu’s, and classifies an image of me as a “solo 1boy” and the latter being the most outrageous, since it will classify the size of cock & balls separately into 6 different size categories all the way from tiny to hyper.
Warning, sidenote is not family friendly

N2U · August 27, 2023, 8:19pm

So in the absence of a Clip API, I decided to generate captions for images using Salesforce’s BLIP (Bootstrapped Language Image Pre-training) model. It’s not perfect, but it works and it only uses 32mb of ram

Usage

The script is intended to be run from the command line. The images should be placed in an ‘images’ directory located in the same directory as the script. The script will process all images in the ‘images’ directory and save the results to a file in the same directory as the script.

Output

The script generates a CSV file named ‘output.csv’ with the following columns:

Filename
Type (PNG, JPEG, JPG, GIF)
Number of Frames (for GIF’s)
Height in Pixels
Width in Pixels
File Size in MB
Description (Caption generated by the BLIP model)


import os
import csv
from PIL import Image
from fractions import Fraction
from transformers import BlipProcessor, BlipForConditionalGeneration

# Load BLIP model
processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base")

# Function to get dimensions of an image
def get_image_dimensions(image_path):
    with Image.open(image_path) as img:
        return img.width, img.height

# Function to get frame count of a GIF
def get_gif_frame_count(gif_path):
    with Image.open(gif_path) as img:
        frames = 0
        while True:
            try:
                img.seek(frames)
                frames += 1
            except EOFError:
                break
        return frames

# Function to compute the aspect ratio of an image
def compute_aspect_ratio(width, height):
    return Fraction(width, height).limit_denominator()

# Function to get file size in MB
def get_file_size_mb(file_path):
    return os.path.getsize(file_path) / (1024 * 1024)

# Function to write a CSV file
def write_csv(csv_file, csv_data):
    with open(csv_file, 'w', newline='') as file:
        writer = csv.writer(file)
        writer.writerow(["Filename", "Type", "Number of Frames", "Height in Pixels", "Width in Pixels", "File Size in MB", "Description"])
        writer.writerows(csv_data)

# Function to get image caption using BLIP
def get_image_caption(image_path):
    raw_image = Image.open(image_path).convert('RGB')
    inputs = processor(raw_image, return_tensors="pt")
    out = model.generate(**inputs)
    caption = processor.decode(out[0], skip_special_tokens=True)
    return caption

# Main function
def main():
    # Define the directory containing the extracted files
    current_dir = os.path.dirname(os.path.abspath(__file__))
    extract_dir = os.path.join(current_dir, 'images')

    # List the contents of the extracted directory
    extracted_files = os.listdir(extract_dir)

    # Prepare data for the CSV
    csv_data = []

    for file in extracted_files:
        file_path = os.path.join(extract_dir, file)
        file_size_mb = get_file_size_mb(file_path)
        if file.lower().endswith(('.png', '.jpg', '.jpeg', '.gif')):
            width, height = get_image_dimensions(file_path)
            description = get_image_caption(file_path)
            frames = 1
            if file.lower().endswith('.gif'):
                frames = get_gif_frame_count(file_path)
            csv_data.append([file, 'Image', frames, height, width, file_size_mb, description])

    # Write the CSV file
    csv_file = os.path.join(current_dir, 'output.csv')
    write_csv(csv_file, csv_data)

# Run the main function
if __name__ == "__main__":
    main()

Topic		Replies	Views
DALL-E API to generate json data from image API api	12	4423	December 19, 2023
GPT-4 API and image input API	49	71503	December 12, 2023
ChatGPT goes Multimodal! Sound and vision is rolling out on ChatGPT Community chatgpt , multimodal	34	12910	December 10, 2023
GPT-4-Vision Interesting Uses and Examples Thread (2023) Community gpt-4-vision	24	11916	April 22, 2024
Can we use images with GPT-3 API	9	3683	November 22, 2022

Image Upload / Recognition via Bing Chat

Usage

Output

Related topics