Mapping items of a list using an external json file

Hey there :slight_smile: Newb GPT api user, I have been tasked with quite the topic here at work.

Here is what I want to do (exact details changed due to NDA and all that):

I have a collection of PDF docs, lets say of video games. These video games are to be parsed to json using a specific schema I defined; this works rather well.

I also have a json file with all the video game categories, its an array of 100 categories with title and ID.

I want to use gpt to, after parsing the PDF to a json, add a category ID and title to the object, using the external json list of categories to find the one which is best.

At the moment I am not sure what the best way to do this is, as far as I can see there are a few ways to do it. Currently I upload the video game doc to my account using the js library, create an assistant with code interpreter tool for using OCR on the document, and then give the schema and exact instructions. Would it makes sense for me to upload the video game categories json and add an additional message to the thread explaining how he should use this file? Or should I define a function, and in this function explain that the assistant should map categories using the previously uploaded file?

On the topic of prompt engineering/assistants/threads, does it make sense to explain a lot of steps in one prompt/message, or should one stream the responses and reactively add appropriate messages?

Sorry for the newb take, any help greatly appreciated :slight_smile:

Have you done some basic testing regarding the mapping accuracy just by providing the list of categories in the prompt as part if the context?

My concern is that 100 different categories is quite a lot and you may run into accuracy issues depending on the nature of the categories. Creating a fine-tuned model or using an embeddings-based classification might be options to consider. Depending on what works best, this would then also impact the specific integration with the Assistant.

I have indeed using ChatGPT and creating my own GPT with uploaded documents. It seemed to work rather well if I was explicit about what I wanted it to do, and the file is actually closer to 200 categories, too much to slap into the context every single time I think