Image Upload / Recognition via Bing Chat

Did they announce this somewhere and I missed it? Not super-smart, but Bing bot is dumbed down quite a bit in my experience…

Then again, it realized it was a game… just couldn’t say which game… which admittedly most humans would probably have trouble with too…

When do you think it will come to ChatGPT and/or API?

ETA: Pretty funny seeing it try to describe a weird image…

Looks like the Bing prompt might not be changed to know it can see the image? It claims it can’t read it, but it knows the images…

I’m sorry, but I am not able to see the image you are referring to. However, based on the description you provided, it appears to be a digital image of a medieval castle on a cliff overlooking a mountain range. The castle is made of stone and has multiple towers and battlements. The castle is built on a cliff with a waterfall cascading down the side. The background consists of a mountain range with snow-capped peaks and a blue sky with clouds. The foreground consists of trees and vegetation. The image appears to be from a video game or a 3D rendering.

I hope this helps! If you have any further questions or if there is anything else I can help you with, please let me know.

This makes it sound like they’re scanning the image then supplying the textual description to the main prompt…

1 Like

That’s weird, I feel like it’s been around for a while now.

I directed another user towards it a few days ago,

Maybe the system messages are different between the web and Bing app?


That could be it… dev “typo” or no love for web anymore haha…

I don’t think there was any specific announcement, but Logan teased it some time ago.

As to how this works, this could be what’s going on

I’ve tried doing this a few times using Clip and that works definitely well :laughing:

1 Like

Now I’m thinking OpenAI should offer a CLIP API endpoint to help devs quickly deploy multimodal apps…


1 Like


+1 from me. That could be absolutely amazing. I made a simple script the other day. All it does is create a CSV file with various data about images in a chosen folder. The most important data entry, the description, is still missing though, because I couldn’t be bothered to write it manually, and loading the full CLIP model would require 4300 times as much RAM.

A simple API for CLIP would be perfect for this!

Sidenote: There are various sites that run a version of CLIP trained on their own datasets. The most notorious are Danbooru and e621. The first one is made for anime waifu’s, and classifies an image of me as a “solo 1boy” and the latter being the most outrageous, since it will classify the size of cock & balls separately into 6 different size categories all the way from tiny to hyper.
Warning, sidenote is not family friendly :sweat_smile:

So in the absence of a Clip API, I decided to generate captions for images using Salesforce’s BLIP (Bootstrapped Language Image Pre-training) model. It’s not perfect, but it works and it only uses 32mb of ram


The script is intended to be run from the command line. The images should be placed in an ‘images’ directory located in the same directory as the script. The script will process all images in the ‘images’ directory and save the results to a file in the same directory as the script.


The script generates a CSV file named ‘output.csv’ with the following columns:

  • Filename
  • Type (PNG, JPEG, JPG, GIF)
  • Number of Frames (for GIF’s)
  • Height in Pixels
  • Width in Pixels
  • File Size in MB
  • Description (Caption generated by the BLIP model)

import os
import csv
from PIL import Image
from fractions import Fraction
from transformers import BlipProcessor, BlipForConditionalGeneration

# Load BLIP model
processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base")

# Function to get dimensions of an image
def get_image_dimensions(image_path):
    with as img:
        return img.width, img.height

# Function to get frame count of a GIF
def get_gif_frame_count(gif_path):
    with as img:
        frames = 0
        while True:
                frames += 1
            except EOFError:
        return frames

# Function to compute the aspect ratio of an image
def compute_aspect_ratio(width, height):
    return Fraction(width, height).limit_denominator()

# Function to get file size in MB
def get_file_size_mb(file_path):
    return os.path.getsize(file_path) / (1024 * 1024)

# Function to write a CSV file
def write_csv(csv_file, csv_data):
    with open(csv_file, 'w', newline='') as file:
        writer = csv.writer(file)
        writer.writerow(["Filename", "Type", "Number of Frames", "Height in Pixels", "Width in Pixels", "File Size in MB", "Description"])

# Function to get image caption using BLIP
def get_image_caption(image_path):
    raw_image ='RGB')
    inputs = processor(raw_image, return_tensors="pt")
    out = model.generate(**inputs)
    caption = processor.decode(out[0], skip_special_tokens=True)
    return caption

# Main function
def main():
    # Define the directory containing the extracted files
    current_dir = os.path.dirname(os.path.abspath(__file__))
    extract_dir = os.path.join(current_dir, 'images')

    # List the contents of the extracted directory
    extracted_files = os.listdir(extract_dir)

    # Prepare data for the CSV
    csv_data = []

    for file in extracted_files:
        file_path = os.path.join(extract_dir, file)
        file_size_mb = get_file_size_mb(file_path)
        if file.lower().endswith(('.png', '.jpg', '.jpeg', '.gif')):
            width, height = get_image_dimensions(file_path)
            description = get_image_caption(file_path)
            frames = 1
            if file.lower().endswith('.gif'):
                frames = get_gif_frame_count(file_path)
            csv_data.append([file, 'Image', frames, height, width, file_size_mb, description])

    # Write the CSV file
    csv_file = os.path.join(current_dir, 'output.csv')
    write_csv(csv_file, csv_data)

# Run the main function
if __name__ == "__main__":
1 Like