Welcome to the community!
Sounds like you’re using the API? Do you have access to the Advanced Data Analysis plugin?
The approach itself doesn’t look all that bad, but perhaps the way in which you’re amalgamating the data and feeding it to GPT seems to be the problem here. You may be forgetting the “parsing” layer.
If it were up to me, I would store the protocol strings in some kind of dictionary data structure or database. Lots of people have their own methods for this. I’m a tinkerer so I just make h5 files, but it’s more or less up to you.
Remember, these models do have input limits and context limits. In instances like Advanced Data Analysis, you could feed it an entire database, but it cannot handle an extremely long string.
Also, are you embedding the entire string as one embedding? I would recommend parsing the data so that each protocol includes its own vector embedding, and use that for context retrieval. If that’s what you want to execute already, ensure your code or logic structure is actually set up to do that.
You’re very much on the right track, but from what you’ve shown us here, it seems you’re missing an extra parsing layer to feed the information to GPT with. It CAN summarize large sets of data, so your goal is very achievable, you just can’t feed it the elephant all in one prompt. Do it iteratively, or provide a database to allow GPT to summarize each string iteratively itself.