Consistency problems in intent analysis

Hi there! Here is the case:

I have compiled a prompt for intent analysis to receive the processing results as Resolve Reject. I’m sending a test array of messages, everything is going well. I start the prod, and garbage start to drop. I get statuses like resolve for that should be rejected. I run those messages in the test array and it’s flagged as rejected. I’m trying to put the same bunch of messages in prod and it gets different results time by time. What could be the reason for this behavior?

Welcome!

Without more details (prompt, model, etc) it’s tough to say.

Current LLMs are not really known for being 100% accurate all the time either. If you’re using a smaller model like 4o-mini, I’d try to give it a few examples.

4o mini, right. Ill provide more info later