The OpenAI GPT-3.5-turbo model does not consistently provide satisfactory responses

At times, the GPT-3.5-turbo model provides inaccurate descriptions of the student comments, occasionally presenting views that contradict the sentiments expressed by the students.

This is the code

public function openai($dataa)
    {
        // dd($dataa);
        $messages = [];
        $combinedContent = ""; // Initialize an empty string
        // dd($data);
        // Assuming $dataArray is your array of strings
        if (!is_array($dataa)) {
            // dd($dataa);
            $dataa = [$dataa];
            
        }
        foreach ($dataa as $item) {
            
            $combinedContent .= $item . " next Comment is  ";
        }
        $messages[] = ['role' => 'assistant', 'content' => "Please analyze the following list of student comments and provide a concise summary of the main themes, opinions, and feedback expressed by the students. The comments are as follows \n\"$combinedContent\" Please ensure that the summary captures the key points and sentiments expressed in these comments. Thank you!"];
        // dd($combinedContent);

        // Now, $messages contains the conversation data in the required format

        // Replace with your API key
        $apiKey = '';
        

        $client = new Client([
            'verify' => false, // Disable SSL verification
        ]);

        $response = $client->post('https://api.openai.com/v1/chat/completions', [
            'headers' => [
                'Authorization' => 'Bearer ' . $apiKey,
            ],
            'json' => [
                'messages' => $messages,
                'max_tokens' => 50, // Adjust the maximum length of the response as needed.
                'model' => "gpt-3.5-turbo",
            ],
        ]);

        $data = json_decode($response->getBody(), true);
        // dd($data);
        $description = $data['choices'][0]['message']['content'];
        // dd($description);
        return $description;
    }

Problem 1: You are giving instructions in the “assistant” role. That is only for showing the AI past conversation turns, or advanced techniques like showing it how it should respond.

Instructing the AI should either be in a “system” or “user” role message.

Problem 2: you are not clearly delineating the instructions from the data. Here is typical use of roles:

system: Assess student reviews of teachers. Summarize in one paragraph.
user: student review: $combinedContent

Processing individual records is best, yet you have language such as “the following list”.

Another parameter you can include along with the model is 'top_p' => 0.5, - which will only give the best word results from the AI.

I assume you have other unseen code that actually evaluates and gives the score and the short ‘very very bad’.

If AI is just is not following instructions well at all, you can try GPT-3.5-turbo-0301, which hasn’t suffered degrading changes recently.

I made changes to the user role

public function openai($dataa)
    {
        // dd($dataa);
        $messages = [];
        $combinedContent = ""; // Initialize an empty string
        // dd($data);
        // Assuming $dataArray is your array of strings
        if (!is_array($dataa)) {
            // dd($dataa);
            $dataa = [$dataa];
            
        }
        foreach ($dataa as $item) {
            
            $combinedContent .= $item . "   ";
        }
        $messages[] = ['role' => 'user', 'content' => "Assess student reviews of teachers\n\"$combinedContent\"Summarize in one paragraph"];
        // dd($combinedContent);

        // Now, $messages contains the conversation data in the required format

        // Replace with your API key
        $apiKey = '';
        

        $client = new Client([
            'verify' => false, // Disable SSL verification
        ]);

        $response = $client->post('https://api.openai.com/v1/chat/completions', [
            'headers' => [
                'Authorization' => 'Bearer ' . $apiKey,
            ],
            'json' => [
                'messages' => $messages,
                'max_tokens' => 50, // Adjust the maximum length of the response as needed.
                'model' => "gpt-3.5-turbo",
                'top_p' => 0.5,
            ],
        ]);

        $data = json_decode($response->getBody(), true);
        // dd($data);
        $description = $data['choices'][0]['message']['content'];
        // dd($description);
        return $description;
    }

The output

I don’t think you can get good results this way. They will either be wrong from the start or different every time.

You will have to define what is good and what is bad and let the assessment be done by multiple agents which only have to answer with yes or no and then calculate a score later from all the agents results.

I am doing about the same for code assessments. But I am also giving weights, because the way an organisation or a supervisor/manager thinks what is good code may differ.

Sometimes they want to give readability of code a higher value over code performance or modularity or vice versa.

So I am running like 3000 agents on a single programmers code evaluation to get good results with an accurate score value.

The more specific the task is that you are giving to the model and the more you know what the model is capable of and where you have to find other ways than utilizing GPT the better the end results.

I don’t think it is possible to do what you are trying to do there even with GPT-4.

And I really hope that you are taking this as serious as I do. Giving someone a score value migth be used against them somewhere (e.g. used for mass layoffs. And do you want to lose a job because you are in the least 10% based on a wrong evaluation result? – I am not allowing this on my system - using it to suggest or create courses for upskilling).

API cost for a single quality evaluation to create an accurate skill matrix is about 10$ per developer.

I’d produce a log of the input text right before the prompt is filled in with it, or create a variable for “user_role_json” that is fully-assembled, and log that.

If the reduction of top_p gives you the same results every time, you might have no “student review” input at all.

To much is missing to evaluate the quality otherwise.

I think you are right after modifying my code

public function test()
    {
        $activeConfiguration = configuration::where('active', 1)->first();
        $data = Feedback::all()->where('hook', $activeConfiguration->vaule);


        $staff_data = [];

        foreach ($data as $value) {
            $staff = Staff::where('id', $value->staff_id)->first();

            if ($staff) {
                $teacherName = $staff->Name;
                $rating = $value->Rating;
                $description = $value->Comment;

                // Check if the teacher name already exists in $staff_data
                if (array_key_exists($teacherName, $staff_data)) {
                    // If it does, add the rating to the existing entry
                    $staff_data[$teacherName]['Rating'] += $rating;
                    $staff_data[$teacherName]['Descriptions'][] = $description;
                } else {
                    // If it doesn't, create a new entry
                    $staff_data[$teacherName] = [
                        'Name' => $teacherName,
                        'Rating' => $rating,
                        'Descriptions' => [$description],
                        'Data_Descriptions' => [$description],
                        // 'Description' => $this->openai(),
                    ];
                }
            }
        }

        // dd($this->openai($staff_data[$teacherName]['Descriptions']));
        // Modify the 'Descriptions' data for each teacher (e.g., concatenate all descriptions into one string)
        foreach ($staff_data as &$teacherData) {
            // dd($staff_data[$teacherName]['Descriptions']);
            $teacherData['Descriptions'] = $this->openai($staff_data[$teacherName]['Descriptions']);
        }
public function openai($descriptions)
{
    // Ensure descriptions is an array
    $descriptions = is_array($descriptions) ? $descriptions : [$descriptions];
    
    // Prepare the conversation messages
    $messages = [
        ['role' => 'system', 'content' => 'Assess student reviews of teachers. Summarize in one paragraph.'],
        ['role' => 'user', 'content' => implode(" Students next remarks:  ", $descriptions)],
    ];
    dd($messages);


    // Set your API key
    $apiKey = '';

    // Make the API request to OpenAI
    $response = $this->makeOpenAIRequest($messages, $apiKey);

    // Extract and return the description from the response
    return $response['choices'][0]['message']['content'];
}

private function makeOpenAIRequest($messages, $apiKey)
{
    $client = new Client([
        'verify' => false, // Disable SSL verification
        'proxy' => 'http://127.0.0.1:8080', // Replace with your proxy settings
    ]);

    $response = $client->post('https://api.openai.com/v1/chat/completions', [
        'headers' => [
            'Authorization' => 'Bearer ' . $apiKey,
        ],
        'json' => [
            'messages' => $messages,
            'max_tokens' => 50, // Adjust as needed
            'model' => 'gpt-3.5-turbo-0301',
            'top_p' => 0.5,
        ],
    ]);

    return json_decode($response->getBody(), true);
}

This is the api responds from openai
First Post Resquest

POST /v1/chat/completions HTTP/2
Host: api.openai.com
User-Agent: GuzzleHttp/7
Content-Type: application/json
Authorization: Bearer 
Content-Length: 276

{"messages":[{"role":"system","content":"Assess student reviews of teachers. Summarize in one paragraph."},{"role":"user","content":"He is a Very good Teacher Students next remarks:  He is Very Good And Know his Job"}],"max_tokens":50,"model":"gpt-3.5-turbo-0301","top_p":0.5}

Second Post

POST /v1/chat/completions HTTP/2
Host: api.openai.com
User-Agent: GuzzleHttp/7
Content-Type: application/json
Authorization: Bearer 
Content-Length: 384

{"messages":[{"role":"system","content":"Assess student reviews of teachers. Summarize in one paragraph."},{"role":"user","content":"Based on the student reviews, the teacher is highly regarded and considered to be very good at his job. The students have expressed their satisfaction with his teaching skills and knowledge."}],"max_tokens":50,"model":"gpt-3.5-turbo-0301","top_p":0.5}

Not only that. I made thousands of manual tests like that. Repeat it more often and the teacher will even be evaluated bad some times.