How to pass large Dictionary with more then 10000 keys in Prompt?

Task: Creating a Mapping Mechanism

I’m attempting to devise a mapping mechanism that consolidates keys with similar values into a single value. Initially, I’ve successfully implemented this mechanism for a smaller dictionary. However, I now face the challenge of extending this approach to accommodate a larger dictionary, containing over 10,000 keys.

Example Scenario:

Consider the following dictionary:

product_dict = {
    "1234": "pizzahut large",
    "4323": "large pizzahut",
    "2456": "big pizza hut",
    "23232": "pizza hut huge",
    "12312": "pizza small hut",
    "434334": "small hut pizza",
    "23421": "pizza large hut",
    "54332": "small category of pizza of pizza hut",
    "431223": "this is large category of pizza hut",
    "454334": "cheese pizza"
}

Desired Outcome :

{
    "pizzahut large": ["1234", "4323", "2456", "23232", "23421", "431223"],
    "pizzahut small": ["12312", "434334", "54332"]
}

Request:

I’m seeking guidance on how to efficiently handle this task for a larger dictionary. Any insights or suggestions would be greatly appreciated. Thank you!

you could consider using embeddings? :thinking:

They’re basically made for this stuff.

Will it do the same task , effectively and do they store the relation between keys and values ?

well, you’d map

    key: string

into

    key: {
        value: string
        embedding: number[]
    }

into

    key: {
        value: string
        closest_category: string
    }

and then reduce it into

    category: key[]   

or something

https://platform.openai.com/docs/guides/embeddings

1 Like