How to "debug" a chat completion API response?

Short question: How can I “debug” a chat completion response, so that I am assured of a particular response that always occurs

Here’s what I do:

I call the OpenAI API chat completion using curl

I pass the “system” content, and the “user’s messages” in an array every time so that OpenAi “remembers” the previous conversations

The “system” content asks the model to play a certain role that it is already very well trained on.

When the model replies to the 1st message from the human, i would like the model to include a certain text, mandatorily every time.

I tried passing this message it in the system content, after the 1st message is sent by the user. The model doesn’t always include this text :frowning:

In my CURL request, I also send the “user” value each time, to ensure some kind of persistence, and when I test with different users, the history of the user’s chat is always different, for each user, for each unique user’s ID

I don’t know why the model doesn’t always include the text that i want it to include.

Is there a smart way to debug these responses and see why I am not getting any responses that I expect?

Are you sending the same “system” message every time? That is, the chat completion API is stateless, so you have to send everything the model needs to know in each and every prompt.

You can “debug” by printing and logging the entire chat completion response. In PHP, this would be “print_r”.

The system message is always present

The system message changes just after the user’s 1st message to ask OpenAI gpt4 to give a static message. This is not used everytime

I want to actually debug the logic used by gpt4: how and why it used the system prompts as well as the user prompts to give out a particular message

i guess this is not available anywhere :frowning:

So… basically one just designs some system prompts and does some tests and hopes for the best

Sounds pretty hopeless

Some of it’s “black box” but some of it can be understood if you dig down in the AI rabbit hole… transformers, neural nets, machine learning, et al…

Is there a specific prompting problem you’re running into?

Basically. Frustrating, tedious, difficult – absolutely. But, I wouldn’t say hopeless.

If you are able, add some other models to your system. I have certain prompts that are specific to my use cases which I use to I test across models to gauge their responses. Doesn’t really help with debugging, but it does help me see where models are strong and weak.

2 Likes

I hear you

I think it is a very interesting feature to have

How do we request this? Is there a way? Their common email box seems to be flooded. Once, I got a response after something like a few weeks :slight_smile:

Yes, please see my previous messages in this thread

You can’t request it. You have to build it.

Something like this:

public function solrai_getChatCompletion($messages, $apiOpenAI) {

    // By the time we get here, $this->model should represent
    // the model that will be used for this query.

    // Get provider info for $this->model
    $this->provider = $this->modelListArray[$this->model]['provider'];
    $this->maxTokens = $this->modelListArray[$this->model]['max_tokens'];
    $this->apiEndpoint = $this->modelListArray[$this->model]['endpoint'];
    $this->apiKey = $this->modelListArray[$this->model]['api_key'];
    $this->apiSecret = $this->modelListArray[$this->model]['secret_key'];

    if ($this->provider === 'mistral') {
        return $this->solrai_getChatCompletionMistral($messages);
    }
    if ($this->provider === 'openai') {
        return $this->solrai_getChatCompletionOpenAI($messages, $apiOpenAI);
    }
    if ($this->provider === 'aws') {
        return $this->solrai_getChatCompletionAWS($messages);
    }
    if ($this->provider === 'google') {
        return $this->solrai_getChatCompletionGoogle($messages);
    }

I have methods that will create the correct $message format based upon the model. This is pretty cool because I’m able to slot in new models rather quickly (especially if they are from the same provider) rather than re-writing existing code to accommodate new/updated models.

Thanks for the tip

From your message, i see that you are handling different message formats and types depending on the model. Nice.

However, i am still at a loss how to “debug” the messages returned by OpenAi, vis-a-vis the system prompt i have given it, in combination with the users’ messages :frowning:

I guess, one way to do it, is to use chatGPT and or a different competing model like Google Gemini Pro, for example, and ask it to evaluate the responses. But then… the question still arises: how do i validate that Google is doing it correctly?

The goal is to do all that automatically, as validating responses manually is not pragmatic!

Thanks for any insights you may have

I am doing exactly that now. I have developed ranking system where I use a model to evaluate all responses. I am using gpt-3.5-turbo-16k to evaluate the responses from a variety of models, from gpt-4-turbo to gemini pro to claude v3 to mistral medium and large.

How does one do this? You have to establish the criteria by which the evaluation should be done. In my case, it’s pretty simple: was the question answered, and if so, to what degree?

Agreed, but to answer your previous question, the only way you’re going to know if the evaluating model is doing it correctly is to check it’s evaluations manually.

I have no idea what you are really hoping to do here or why, but I can tell you that if you intend to totally remove any human oversight, you’re setting the project up for disaster. Just my opinion, not a fact.

2 Likes