Hi everyone,
I’m trying to automate the selection & assembly of different images to illustrate a short voice over. Putting aside the cost & processing time for now, my main goal is to come up with the most relevant & accurate image selection.
Right now, despite many iterations, I still end up with something particularly bad where images are relevant only 50% (best case scenario, but usually less).
HERE IS THE PROCESS I CREATED SO FAR:
Step1: Retrieve images from a database using 2 processes:
-Specific keywords: Generate a very accurate keywords related to each paragraph of my voicer over, hoping to retrieve images particularly relevant to illustrate that specific part of the voice over.
-Broad keywords: Generate general keywords related to the voice over, hoping to retrieve additional images that could be used as a backfill solution to illustrate my over in case specific images cannot be used.
→ Approximately 600 images retrieved
Step2: Apply a broad filter to remove bad images (blur, duplicates, images with text, etc.).
→ Approximately 150 images remaining.
Step3: Send each image to GPT & retrieve a 2-3 sentences image description.
Step4: Using a combination of the following elements:
-Image description (retrieved in step 3)
-Voice over script (2 pages)
-Contextual information related to the Voice over (short document, <20 pages, containing general info about the voice over)
I send batches of 10 images to GPT, asking to exclude the most irrelevant / out of topic images.
→ Approximately 80 images remaining
Step 5: Final image selection:
Going through each paragraph of the voice over
Focusing on images obtained with specific keywords (using image description). Asking GPT to select the top 3 relevant images to illustrate the paragraph
Then, focusing on images obtained with broad keywords (using image description). Asking GPT to review the selected images & to determine if the broad images could be more relevant to replace one of the specific images.
Repeating the process a few times until reviewing all images available.
What would you recommend to improve my selection relevancy? I’m kind of out of options at this point.
Thank you in advance for your help.