I am trying to implement semantic/vector search for images.
To do that, I am using gpt-4-mini to analyze an image and create data from it with this prompt:
Your job is to generate json data from a given image.
Return your output in the following format:
{
description: "A description of the image. Only use relevant keywords.",
text: "If the image contains text, include that here, otherwise remove this field",
keywords: "Keywords that describe the image",
artstyle: "The art style of the image",
text_language: "The language of the text in the image, otherwise remove this field",,
design_theme : "If the image has a theme (hobby, interest, occupation etc.), include that here, otherwise remove this field",
}
The data I am getting back is pretty accurate (in my eyes). I am then embedding the json with the “text-embedding-3-small” model.
The problem is that the search results are pretty bad.
For example: I have 2 images with only text. One says “straight outta knee surgery” and one says “straight outta valhalla”.
When I search for “straight outta”, I have to turn down the similary treshold to 0.15 to get both results.
This is my postgres search function:
CREATE
OR REPLACE FUNCTION search_design_items (
query_embedding vector (1536),
match_threshold FLOAT,
match_count INT
) RETURNS TABLE (
id BIGINT
) AS $$
BEGIN
RETURN QUERY
SELECT id
FROM public.design_management_items
WHERE 1 - (design_management_items.description_vector <=> query_embedding) > match_threshold
ORDER BY (design_management_items.description_vector <=> query_embedding) asc
LIMIT match_count;
END;
$$ LANGUAGE plpgsql;
When I go into higher numbers (0.5) there are pretty much no results at all. This seems wrong because in every tutorial I have seen they use a threshold of 0.7+
What do I need to change in order to improve the accuracy of my search results?