Using the LLM to Categorize Responses

I have a RAG query system where I am logging each question/response pair. I am thinking to use the model to assign a “subject” or “category” designation for each record. It is a large regulatory dataset, and I don’t have an established category system in place, so my plan right now is to let the model come up with it’s own terms based upon it’s analysis of the q/a text.

I instruct it to examine the q/a text and if it matches an existing category, assign that category to the record otherwise create a new category and assign it – then add the new category to the list.

A rather informal way to do this, I know, but I’m just looking at analytics down the road. “How many people had questions in this area?” sort of thing.

I’m not sure if this is a good way to do this, so seeking comments from others with a bit more experience at it.

1 Like

Sounds like a solid plan to me.

Personally, (and semi-related?), I’ve been using LDA and code interpreter for getting categories from keyword lists…

1 Like

Well, finally put the code together. Wasn’t too difficult.

Biggest problem is trying to come up with a prompt for gpt-3.5-turbo-16k. In a database that consists exclusively of real estate regulations, it categorizes everything as “Real Estate Regulations”. Then, I tell it don’t do that, and it finds another global term to use. The I tell it don’t use that one either, then it goes back to using “Real Estate Regulations”. Arrrggghhhh!!!

As usual, gpt-4 does it perfectly (well, pretty good).

But, now I’m set up for my next project: Creating text to sql statements for analytics. That should be fun.