The OpenAI GPT-3.5-turbo model does not consistently provide satisfactory responses

magbenyo · September 14, 2023, 8:12am

At times, the GPT-3.5-turbo model provides inaccurate descriptions of the student comments, occasionally presenting views that contradict the sentiments expressed by the students.

This is the code

public function openai($dataa)
    {
        // dd($dataa);
        $messages = [];
        $combinedContent = ""; // Initialize an empty string
        // dd($data);
        // Assuming $dataArray is your array of strings
        if (!is_array($dataa)) {
            // dd($dataa);
            $dataa = [$dataa];
            
        }
        foreach ($dataa as $item) {
            
            $combinedContent .= $item . " next Comment is  ";
        }
        $messages[] = ['role' => 'assistant', 'content' => "Please analyze the following list of student comments and provide a concise summary of the main themes, opinions, and feedback expressed by the students. The comments are as follows \n\"$combinedContent\" Please ensure that the summary captures the key points and sentiments expressed in these comments. Thank you!"];
        // dd($combinedContent);

        // Now, $messages contains the conversation data in the required format

        // Replace with your API key
        $apiKey = '';
        

        $client = new Client([
            'verify' => false, // Disable SSL verification
        ]);

        $response = $client->post('https://api.openai.com/v1/chat/completions', [
            'headers' => [
                'Authorization' => 'Bearer ' . $apiKey,
            ],
            'json' => [
                'messages' => $messages,
                'max_tokens' => 50, // Adjust the maximum length of the response as needed.
                'model' => "gpt-3.5-turbo",
            ],
        ]);

        $data = json_decode($response->getBody(), true);
        // dd($data);
        $description = $data['choices'][0]['message']['content'];
        // dd($description);
        return $description;
    }

_j · September 14, 2023, 8:38am

Problem 1: You are giving instructions in the “assistant” role. That is only for showing the AI past conversation turns, or advanced techniques like showing it how it should respond.

Instructing the AI should either be in a “system” or “user” role message.

Problem 2: you are not clearly delineating the instructions from the data. Here is typical use of roles:

system: Assess student reviews of teachers. Summarize in one paragraph.
user: student review: $combinedContent

Processing individual records is best, yet you have language such as “the following list”.

Another parameter you can include along with the model is 'top_p' => 0.5, - which will only give the best word results from the AI.

I assume you have other unseen code that actually evaluates and gives the score and the short ‘very very bad’.

If AI is just is not following instructions well at all, you can try GPT-3.5-turbo-0301, which hasn’t suffered degrading changes recently.

magbenyo · September 14, 2023, 10:24am

I made changes to the user role

public function openai($dataa)
    {
        // dd($dataa);
        $messages = [];
        $combinedContent = ""; // Initialize an empty string
        // dd($data);
        // Assuming $dataArray is your array of strings
        if (!is_array($dataa)) {
            // dd($dataa);
            $dataa = [$dataa];
            
        }
        foreach ($dataa as $item) {
            
            $combinedContent .= $item . "   ";
        }
        $messages[] = ['role' => 'user', 'content' => "Assess student reviews of teachers\n\"$combinedContent\"Summarize in one paragraph"];
        // dd($combinedContent);

        // Now, $messages contains the conversation data in the required format

        // Replace with your API key
        $apiKey = '';
        

        $client = new Client([
            'verify' => false, // Disable SSL verification
        ]);

        $response = $client->post('https://api.openai.com/v1/chat/completions', [
            'headers' => [
                'Authorization' => 'Bearer ' . $apiKey,
            ],
            'json' => [
                'messages' => $messages,
                'max_tokens' => 50, // Adjust the maximum length of the response as needed.
                'model' => "gpt-3.5-turbo",
                'top_p' => 0.5,
            ],
        ]);

        $data = json_decode($response->getBody(), true);
        // dd($data);
        $description = $data['choices'][0]['message']['content'];
        // dd($description);
        return $description;
    }

The output

jochenschultz · September 14, 2023, 11:34am

I don’t think you can get good results this way. They will either be wrong from the start or different every time.

You will have to define what is good and what is bad and let the assessment be done by multiple agents which only have to answer with yes or no and then calculate a score later from all the agents results.

I am doing about the same for code assessments. But I am also giving weights, because the way an organisation or a supervisor/manager thinks what is good code may differ.

Sometimes they want to give readability of code a higher value over code performance or modularity or vice versa.

So I am running like 3000 agents on a single programmers code evaluation to get good results with an accurate score value.

The more specific the task is that you are giving to the model and the more you know what the model is capable of and where you have to find other ways than utilizing GPT the better the end results.

I don’t think it is possible to do what you are trying to do there even with GPT-4.

And I really hope that you are taking this as serious as I do. Giving someone a score value migth be used against them somewhere (e.g. used for mass layoffs. And do you want to lose a job because you are in the least 10% based on a wrong evaluation result? – I am not allowing this on my system - using it to suggest or create courses for upskilling).

API cost for a single quality evaluation to create an accurate skill matrix is about 10$ per developer.

_j · September 14, 2023, 11:48am

I’d produce a log of the input text right before the prompt is filled in with it, or create a variable for “user_role_json” that is fully-assembled, and log that.

If the reduction of top_p gives you the same results every time, you might have no “student review” input at all.

To much is missing to evaluate the quality otherwise.

magbenyo · September 14, 2023, 11:58am

I think you are right after modifying my code

public function test()
    {
        $activeConfiguration = configuration::where('active', 1)->first();
        $data = Feedback::all()->where('hook', $activeConfiguration->vaule);


        $staff_data = [];

        foreach ($data as $value) {
            $staff = Staff::where('id', $value->staff_id)->first();

            if ($staff) {
                $teacherName = $staff->Name;
                $rating = $value->Rating;
                $description = $value->Comment;

                // Check if the teacher name already exists in $staff_data
                if (array_key_exists($teacherName, $staff_data)) {
                    // If it does, add the rating to the existing entry
                    $staff_data[$teacherName]['Rating'] += $rating;
                    $staff_data[$teacherName]['Descriptions'][] = $description;
                } else {
                    // If it doesn't, create a new entry
                    $staff_data[$teacherName] = [
                        'Name' => $teacherName,
                        'Rating' => $rating,
                        'Descriptions' => [$description],
                        'Data_Descriptions' => [$description],
                        // 'Description' => $this->openai(),
                    ];
                }
            }
        }

        // dd($this->openai($staff_data[$teacherName]['Descriptions']));
        // Modify the 'Descriptions' data for each teacher (e.g., concatenate all descriptions into one string)
        foreach ($staff_data as &$teacherData) {
            // dd($staff_data[$teacherName]['Descriptions']);
            $teacherData['Descriptions'] = $this->openai($staff_data[$teacherName]['Descriptions']);
        }

public function openai($descriptions)
{
    // Ensure descriptions is an array
    $descriptions = is_array($descriptions) ? $descriptions : [$descriptions];
    
    // Prepare the conversation messages
    $messages = [
        ['role' => 'system', 'content' => 'Assess student reviews of teachers. Summarize in one paragraph.'],
        ['role' => 'user', 'content' => implode(" Students next remarks:  ", $descriptions)],
    ];
    dd($messages);


    // Set your API key
    $apiKey = '';

    // Make the API request to OpenAI
    $response = $this->makeOpenAIRequest($messages, $apiKey);

    // Extract and return the description from the response
    return $response['choices'][0]['message']['content'];
}

private function makeOpenAIRequest($messages, $apiKey)
{
    $client = new Client([
        'verify' => false, // Disable SSL verification
        'proxy' => 'http://127.0.0.1:8080', // Replace with your proxy settings
    ]);

    $response = $client->post('https://api.openai.com/v1/chat/completions', [
        'headers' => [
            'Authorization' => 'Bearer ' . $apiKey,
        ],
        'json' => [
            'messages' => $messages,
            'max_tokens' => 50, // Adjust as needed
            'model' => 'gpt-3.5-turbo-0301',
            'top_p' => 0.5,
        ],
    ]);

    return json_decode($response->getBody(), true);
}

This is the api responds from openai
First Post Resquest

POST /v1/chat/completions HTTP/2
Host: api.openai.com
User-Agent: GuzzleHttp/7
Content-Type: application/json
Authorization: Bearer 
Content-Length: 276

{"messages":[{"role":"system","content":"Assess student reviews of teachers. Summarize in one paragraph."},{"role":"user","content":"He is a Very good Teacher Students next remarks:  He is Very Good And Know his Job"}],"max_tokens":50,"model":"gpt-3.5-turbo-0301","top_p":0.5}

Second Post

POST /v1/chat/completions HTTP/2
Host: api.openai.com
User-Agent: GuzzleHttp/7
Content-Type: application/json
Authorization: Bearer 
Content-Length: 384

{"messages":[{"role":"system","content":"Assess student reviews of teachers. Summarize in one paragraph."},{"role":"user","content":"Based on the student reviews, the teacher is highly regarded and considered to be very good at his job. The students have expressed their satisfaction with his teaching skills and knowledge."}],"max_tokens":50,"model":"gpt-3.5-turbo-0301","top_p":0.5}

jochenschultz · September 14, 2023, 12:01pm

Not only that. I made thousands of manual tests like that. Repeat it more often and the teacher will even be evaluated bad some times.

Topic		Replies	Views
Gpt-3.5-turbo in playground not giving same response as API with identical setup API	3	2258	December 14, 2023
GPT powered learning solution API api	21	2329	December 19, 2023
API Calls extremly slow or never finished Feedback gpt-4	16	1939	November 1, 2023
Multiple API calls and Steps (messages): Is this the correct way? API	4	3946	September 15, 2023
Struggling with poor performance on fine-tuned davinci model API	15	2681	December 20, 2023

The OpenAI GPT-3.5-turbo model does not consistently provide satisfactory responses

Related topics