[Beginner] I am a beginner in using openai API functionalities. My current project task involves generating visual descriptions from ut-zappos image dataset which has about 50k images in total. I have currently written the following script for processing a single image for confidence building purposes, which I will extend later to the whole dataset.
import openai
from PIL import Image
import os
from dotenv import load_dotenv
load_dotenv()
# Get the OpenAI API key from environment variable
openai.api_key = os.getenv("OPENAI_API_KEY")
def generate_prompt_from_image(image_path, model="gpt-4o-mini"):
"""Generate a short description for the image using GPT."""
# Extract image features (optional: for multi-modal input)
# image_features = extract_image_features(image_path)
# Create the prompt
prompt = "Describe this image in no more than 100 characters."
# Call GPT-4o-mini to generate the description
response = openai.ChatCompletion.create(
model=model,
messages=[
{"role": "system", "content": "You are a visual description assistant."},
{"role": "user", "content": prompt}
],
temperature=0.7,
max_tokens=100,
)
return response["choices"][0]["message"]["content"].strip()
# Test with a single image
image_path = "/home/engineer2/Documents/h_w/ref h/Gen AI & LLMs/Project-02/ut-zap50k/images/Canvas_Sandals/7883367.409.jpg"
description = generate_prompt_from_image(image_path)
print(f"Generated Description: {description}")
Now i get the following rateErrorLimit error, which makes sense because currently i have not paid for any API tokens. But I plan on buying new tokens in near-future, as my requirement demands it.
openai.error.RateLimitError: You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.
My queries and here is many fold as follows:
- First of all, for a single image, is it better to use gpt-4o-mini. And how many tokens will a single image process take?
- Secondly, when I will extend to the whole dataset what’s the most economical option i can try to send my prompt requests? As my dataset contains around 50k images and I wish to use the same prompt and retrieve and save the output.
Any guidance in both these queries will be highly appreciated beacuse I am a total beginner in using the power of LLMs via APIs especially OpenAPI. Kindly also guide me how would an industry professional or veteran would approach this problem and how will they find the most feasible solution to it., I will be open to their suggestions and will gladly incorporate them.
Thanks.