Generating visual descriptions for ut-zappos images dataset

[Beginner] I am a beginner in using openai API functionalities. My current project task involves generating visual descriptions from ut-zappos image dataset which has about 50k images in total. I have currently written the following script for processing a single image for confidence building purposes, which I will extend later to the whole dataset.

import openai
from PIL import Image
import os
from dotenv import load_dotenv

load_dotenv()

# Get the OpenAI API key from environment variable
openai.api_key = os.getenv("OPENAI_API_KEY")


def generate_prompt_from_image(image_path, model="gpt-4o-mini"):
    """Generate a short description for the image using GPT."""
    # Extract image features (optional: for multi-modal input)
    # image_features = extract_image_features(image_path)

    # Create the prompt
    prompt = "Describe this image in no more than 100 characters."

    # Call GPT-4o-mini to generate the description
    response = openai.ChatCompletion.create(
        model=model,
        messages=[
            {"role": "system", "content": "You are a visual description assistant."},
            {"role": "user", "content": prompt}
        ],
        temperature=0.7,
        max_tokens=100,
    )

    return response["choices"][0]["message"]["content"].strip()

# Test with a single image
image_path = "/home/engineer2/Documents/h_w/ref h/Gen AI & LLMs/Project-02/ut-zap50k/images/Canvas_Sandals/7883367.409.jpg"
description = generate_prompt_from_image(image_path)
print(f"Generated Description: {description}")

Now i get the following rateErrorLimit error, which makes sense because currently i have not paid for any API tokens. But I plan on buying new tokens in near-future, as my requirement demands it.

openai.error.RateLimitError: You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.

My queries and here is many fold as follows:

  1. First of all, for a single image, is it better to use gpt-4o-mini. And how many tokens will a single image process take?
  2. Secondly, when I will extend to the whole dataset what’s the most economical option i can try to send my prompt requests? As my dataset contains around 50k images and I wish to use the same prompt and retrieve and save the output.

Any guidance in both these queries will be highly appreciated beacuse I am a total beginner in using the power of LLMs via APIs especially OpenAPI. Kindly also guide me how would an industry professional or veteran would approach this problem and how will they find the most feasible solution to it., I will be open to their suggestions and will gladly incorporate them.

Thanks.

Hello there,

  1. Yes, I’ve used gpt-4o-mini in my organization, and it works well for generating image descriptions and cheaper than the other Open AI models. The token usage varies depending on the input, but if I’m not mistaken, the Zappos dataset is from Kaggle and focuses on shoe images, correct? Based on my testing, processing an image from this dataset typically requires around 8,500 tokens. If I may suggest, you can try using the Chat Playground to experiment with different models. Upload any image to the Playground, and it will estimate the token usage. Alternatively, you can use the Python library tiktoken to calculate token counts directly in your code.

  2. If we assume the same token usage for all 50,000 images, the estimated cost would be around $64. To reduce this cost, you can explore using the batch processing functionality, which could cut the cost by up to 50%, bringing it down to approximately $32.

Hope it helps.