I have fine tuned my configuration for the gpt-3.5-turbo chat model in the openai playground and it consistently returns the desired response.
However when implementing this into the API code the responses are completely wrong, even with the exact same inputs. (fyi I have included language people may find hateful to make sure the bot catches it).
Here is a screenshot of the playground:
$client = (new ChatGPT())->client;
$response = $client->chat([
'model' => 'gpt-3.5-turbo',
'messages' => [
[
"role" => "system",
"content" => "You are RequirementsHelperGPT your job is to give feedback on requirements used for screening papers written by scientists.
Briefly state if a given requirement is clear enough to be given to another ScreenerGPT ai or not.
Then give brief bullet point reasons on how it could be misinterpreted.
Then give a reworked example of the requirement.
If your given input cannot be responded with a rework return {null}
If it is absurdly unclear, incomplete, hateful, or otherwise unrelated to requirements then don't give a rework and return {null}
Have {true} or {false} at the start if a rework was needed and {null} if no rework was attempted."
],
[
"role" => "user",
"content" => "The study must contain a non-larval or embryonic stage (i.e., adult, or juvenile) of zebrafish. If study states behavioral changes on larvae or embryo with no mention of other life stages exclude the study. If a study is carried out on multiple life stages but does not explicitly state which life stage is included in the study state “maybe”."
],
[
"role" => "assistant",
"content" => "{true}
The requirement is relatively clear.
Reasons why it could be misinterpreted:
- The term 'larvae' may be confused with other early life - stages of zebrafish, such as 'embryos'.
- The requirement doesn't specify what to do if a study includes both larvae/embryos and non-larval/non-embryonic stages, which may lead to inconsistent application of the requirement.
Reworked example:
The study must include at least one non-larval/non-embryonic stage (i.e., adult or juvenile) of zebrafish. If a study only reports behavioral changes in larvae/embryos without any mention of other life stages, exclude the study. If a study is carried out on multiple life stages, but it is not clear which life stage is included in the reported results, mark the study as 'maybe'."
],
[
"role" => "user",
"content" => "The study must show a behavioral change in response to chemical pesticide exposure. Behavioral adjustments can be in a social or non-social context."
],
[
"role" => "assistant",
"content" => "{false}
The requirement is clear and concise.
No potential misinterpretations come to mind.
No reworking is needed."
],
[
"role" => "user",
"content" => "The study must be very interesting"
],
[
"role" => "assistant",
"content" => "{null}
The requirement is too vague and subjective to be used for screening.
It cannot be used as a screening criterion since it is not objective or measurable.
No reworking is possible."
],
[
"role" => "user",
"content" => "gay"
],
[
"role" => "assistant",
"content" => "{null}
This input is unrelated to requirements screening."
],
[
"role" => "user",
"content" => "i hate fags"
],
[
"role" => "assistant",
"content" => "{null}
This input is unrelated to requirements screening."
],
[
"role" => "user",
"content" => "i hate you"
],
[
"role" => "assistant",
"content" => "{null}
This input is unrelated to requirements screening."
],
],
'temperature' => 0.35,
'max_tokens' => 256,
'frequency_penalty' => 0,
'presence_penalty' => 0,
'top_p' => 1,
]);
$info = json_decode($response);
dd($info);
when prompted:
The study must contain a non-larval or embryonic stage (i.e., adult, or juvenile) of zebrafish. If study states behavioral changes on larvae or embryo with no mention of other life stages exclude the study. If a study is carried out on multiple life stages but does not explicitly state which life stage is included in the study state “maybe”.
The playground correctly responds with:
{false}
The requirement is clear and concise.
No potential misinterpretations come to mind.
No reworking is needed.
But the api incorrectly responds with:
{null}
This input is unrelated to requirements screening.
I’m using php with the orhanerday/open-ai api wrapper version 4.7.1
Both are using the gpt-3.5-turbo-0301, the output doesn’t change when i specifically tell it to use that version.
This is a real head scratcher, if anyone could help that would be amazing.