Get consistency in responses across different API calls to ChatGPT

diegopefm · July 18, 2024, 10:50pm

I’m working on an app that users can use to know their level in subjects like math or english (for example) through multiple choice tests divided into 5 levels for each subject.

I’m currently working on a “Know My Level” feature which through an api call -passing some instructions to ChatGPT along with the list of the user’s records plus the list of all test types and subjects the app includes- will generate a report based on an html template I’m also providing.

I’m facing different problems, one of them is that many times the response is just a copy paste of my template with no relevant information in it (or very little information), but the most important problem is the inconsistency of the results.

I know I cannot expect all the responses to follow exactly the same structure and be identical, but this is more or less my idea, or at least to get an approach to that, and this is why I’ve created a template, to make ChatGPT responses to follow a consistent pattern. If not, every time a same user hits on “Know My Level” will see a different response (independently on whether the response is correct or not).

I’m working on Kotlin (it’s an Android app), and I’ve tried many different combinations of inputs to improve the results, but I’m encountering more or less the same behaviour always with little improvements.

Maybe someone experienced here may help me with some hints on how to better formulate the request to achieve my goal.

This is my current request, and I’m replacing the {} sections with the corresponding information from code:

Below is a list of tests and subjects the app includes followed by my list of records in {subject}. I need to know my level in the subject provided, so please, deeply analize the information I’m providing and complete the html template with your results following my instructions:

This is my list of records with their corresponding scores.

{records}

These next are the tests and subjects the app includes:

{tests}

And finally this is the template in html format you will use to create the report with the information in your result:

Instructions:

Generate a summary of scores and replace [summary] with your summary.

Generate an statistical analysis and replace [statisticalanalysis] with your result.

Generate a score analysis and replace [scoreanalysis] with your result.

Generate interpretation of results and replace [interpretationofresults] with your response.

Generate evaluation of performance and replace [evaluationofperformance] with your results.

Generate strengths and replace [strengths] with the strengths you generated.

Generate areas of improvement and replace [areasofimprovement] with your results.

Generate recommendations and replace [recommendations] with your results.

Generate a conclusion and replace [conslusion] with your results.

Evaluate my level in the subject (for example, beginner, intermediate and advanced) and replace [level] with your evaluation.

<!DOCTYPE html>
<html>
<head>
    <style>
        body {
            font-family: Arial, sans-serif;
            max-width: 800px;
            margin: 0 auto;
            margin: 30px;
            font-size: 1.5em;
        }
        h2 {
            color: #333;
            margin-top: 20px;
        }
        p {
            margin-top: 10px;
        }
        table {
            border-collapse: collapse;
            width: 100%;
            margin-bottom: 20px;
        }
        th, td {
            padding: 8px;
            text-align: left;
            border-bottom: 1px solid #ddd;
        }
        th {
            background-color: #f2f2f2;
        }
        .chart-container {
            margin-top: 20px;
        }
        * {
            margin: 0;
            padding: 0;
        }
        .imgbox {
            display: grid;
            height: 100%;
        }
        .center-fit {
            max-width: 100%;
            max-height: 100vh;
            margin: auto;
        }
    </style>
</head>
<body>
    <h1>Math Test Evaluation Report</h1>
    
    <h2>1. Summary of Scores</h2>
    <p>[summary]</p>
    
    <h2>2. Statistical Analysis</h2>
    <p>[statisticalanalysis]</p>
    
    <h2>3. Score Analysis</h2>
    <p>[scoreanalysis]</p>
    
    <h2>4. Interpretation of Results</h2>
    <p>[interpretationofresults]</p>
    
    <h2>5. Evaluation of Performance</h2>
    <p>[evaluationofperformance]</p>
    
    <h2>6. Strengths</h2>
    <p>[strengths]</p>
    
    <h2>7. Areas for Improvement</h2>
    <p>[areasofimprovement]</p>
    
    <h2>8. Recommendations</h2>
    <p>[recommendations]</p>
    
    <h2>9. Conclusion</h2>
    <p>[conclusion]</p>
    
    <h2>10. Your Level</h2>
	<p>[level]</p>    
</body>
</html>

I expect your results will be this same html but with all sections completed with the required information.

An example of records and test types for you guys to have more context is the next:

User Records:

[SAT Math Level 1 Test 1 → 80, SAT Math Level 1 Test 1 → 80, SAT Math Level 1 Test 1 → 40, SAT Math Level 1 Test 1 → 100, SAT Math Level 1 Test 1 → 100, SAT Math Level 1 Test 1 → 60, SAT Math Level 1 Test 1 → 100, SAT Math Level 1 Test 1 → 0, SAT Math Level 1 Test 1 → 40, SAT Math Level 1 Test 1 → 20, SAT Math Level 1 Test 1 → 40, SAT Math Level 1 Test 1 → 40, SAT Math Level 1 Test 1 → 40, SAT Math Level 1 Test 1 → 40, SAT Math Level 1 Test 1 → 100, SAT Math Level 1 Test 1 → 100, SAT Math Level 1 Test 1 → 20, SAT Math Level 1 Test 1 → 100, SAT Math Level 1 Test 1 → 100, SAT Math Level 1 Test 1 → 60, SAT Math Level 1 Test 1 → 100, SAT Math Level 1 Test 1 → 100]

Tests:

ACT Science
AI ACT Science
SAT Reading
ACT Math
AI GRE Verbal Reasoning… and more

Maybe I’m giving too much information at once, not sure, but I cannot ask more than one question (ChatMessage in Kotlin) in the same request as far as I know, but I’m open to shorten my request including less sections, if this could help.

This next is an example of a very good response ChatGPT given in one particular request, so this is more or less what I expect:

martinrobson · July 18, 2024, 11:36pm

It’s all in the prompt. Have you tried asking ChatGPT to help create a prompt for you?

You have quite a bit of information to go off.

val prompt = “”"
You are tasked with generating a comprehensive report based on the provided data. The report will evaluate the user’s level in a specific subject by analyzing their test scores. Follow the instructions meticulously and use the provided HTML template to structure your response.

User Data

Below is a list of the user’s records with their corresponding scores:
{records}

Test and Subject Data

These are the tests and subjects the app includes:
{tests}

Instructions:

Summary of Scores: Generate a detailed summary of the user’s scores and replace [summary] with your summary.
Statistical Analysis: Conduct a thorough statistical analysis of the user’s scores and replace [statisticalanalysis] with your findings.
Score Analysis: Provide a detailed analysis of the user’s scores and replace [scoreanalysis] with your analysis.
Interpretation of Results: Offer a comprehensive interpretation of the user’s results and replace [interpretationofresults] with your interpretation.
Evaluation of Performance: Evaluate the user’s performance in detail and replace [evaluationofperformance] with your evaluation.
Strengths: Identify the user’s strengths based on the data and replace [strengths] with your findings.
Areas for Improvement: Highlight specific areas where the user can improve and replace [areasofimprovement] with your suggestions.
Recommendations: Provide actionable recommendations for the user to improve their skills and replace [recommendations] with your advice.
Conclusion: Summarize the overall evaluation and replace [conclusion] with your summary.
Level Evaluation: Assess the user’s level in the subject (e.g., beginner, intermediate, advanced) and replace [level] with your evaluation.

HTML Template

Use the following HTML template to structure your report. Replace the placeholders with the appropriate content generated from the instructions above:

body { font-family: Arial, sans-serif; max-width: 800px; margin: 0 auto; padding: 30px; font-size: 1.5em; } h2 { color: #333; margin-top: 20px; } p { margin-top: 10px; } table { border-collapse: collapse; width: 100%; margin-bottom: 20px; } th, td { padding: 8px; text-align: left; border-bottom: 1px solid #ddd; } th { background-color: #f2f2f2; } .chart-container { margin-top: 20px; } * { margin: 0; padding: 0; } .imgbox { display: grid; height: 100%; } .center-fit { max-width: 100%; max-height: 100vh; margin: auto; }

Subject Test Evaluation Report

<h2>1. Summary of Scores</h2>
<p>[summary]</p>

<h2>2. Statistical Analysis</h2>
<p>[statisticalanalysis]</p>

<h2>3. Score Analysis</h2>
<p>[scoreanalysis]</p>

<h2>4. Interpretation of Results</h2>
<p>[interpretationofresults]</p>

<h2>5. Evaluation of Performance</h2>
<p>[evaluationofperformance]</p>

<h2>6. Strengths</h2>
<p>[strengths]</p>

<h2>7. Areas for Improvement</h2>
<p>[areasofimprovement]</p>

<h2>8. Recommendations</h2>
<p>[recommendations]</p>

<h2>9. Conclusion</h2>
<p>[conclusion]</p>

<h2>10. Your Level</h2>
<p>[level]</p>

Validation:

Ensure the response follows this HTML template with all placeholders appropriately filled. If the response does not adhere to the template or lacks detail, please refine the input and retry.

I expect your results will be this same HTML but with all sections completed with the required information.
“”"

val response = apiClient.complete(prompt, /* other parameters */)

diegopefm · July 19, 2024, 10:33am

Thank you very much for your reply @martinrobson. I did asked ChatGPT how would be the best way to formulate my request, and it was ChatGPT itself who suggested using a template to narrow the responses, and so I did, but it is visibly not enough.

Your input example looks precise (and concise), but I’ll share the results of my 6 attempts using it with you.

Api call 1:

No data in the placeholders.

Api call 2:

Looks good!

Api call 3:

Same as 2, looks good!

Api call 4/5: Same as 1:

Bad.

Api call 6:

It looks definitely bad.

Conclusion, I still cannot get consistent responses, and I cannot rely on this, because I’m having one correct response every three api requests (more or less).

I cannot present this to the final user.

Given the facts, I’m not so optimistic I’ll be able to deal with this, unluckily AI still fails often.

I thought one possibility could be to ask for every section separately, to ensure I have a precise response, and then do the creation of the report myself joining the parts, but of course I cannot make 10 api requests for every user button click, so in the end this is not an option.

Any other suggestions? Or I, sadly, should forget about this feature implementation?

diegopefm · July 19, 2024, 1:59pm

Ok @martinrobson , I kept researching on the subject, and asked ChatGPT again about the best format to formulate my request and I finally got what I was looking for.

The key was in the format I was sending the user’s info, it was in string format, but ChatGPT likes json formatted text. Once I changed the user’s data from string to json string and included that in your input example, I started to get consistent responses -like the example I’ll paste below-.

Of course this needs much more testing and development/improvements yet, but now I know I’m in the right direction and getting much better results.

I’ll let you know after deep testing has been done.

Thanks again for your input and time

martinrobson · July 19, 2024, 5:53pm

Field mapping… the struggle is real… field types, data length… then, as you’re seeing cleaning Generative AI responses… You’ll need a lot of prompt work to make it consistent.

Try making an assistant.

Happy it’s working

kevin.dragan · July 19, 2024, 6:50pm

Prompt engineering can only do so much. In this case especially ie in one take. You could try proving more “shot examples” as well. Have you tried breaking it up and just chaining together some prompts to see if it does a better job of what you want.

If you want extremely deterministic outcome you need to go programmatically where you set pydantic models for output schema etc. If you’re looking for lower code alternatives, you could try langflow or Flowise but if you’re not into programming that is probably a big jump still.

diegopefm · July 19, 2024, 8:41pm

Hello @kevin.dragan. Firstly, thank you very much for your input.

I’m a software engineer, and, among other languages, I develop for Android with Kotlin and Java, so it’s not a question of being into programming or not, I love programming, but for personal reasons, at this moment I’m looking for an “easy to implement” solution.

Regarding creating one prompt for each section of the report and chaining them myself is something that came to my, but I was not completely sure that was possible. Now I know that you can send several messages (in Kotlin is a “ChatMessage” object with user prompts) at once and get a response, so it’s a good option to give a chance.

I’ll give it a try and let you know.

Munna23 · July 19, 2024, 9:44pm

Did you get a chance to look at seed under chat completions API. Check it out and test it to see if it is of any use for your use case. Cheers!

diegopefm · July 20, 2024, 5:05pm

Hello @kevin.dragan and @martinrobson. I’m trying to make multiple questions in the same request (as kevin suggested) to get more focused results for each section, but the problem is that ChatGPT only responds to my first question. Any idea why (I use Kotlin)?

This is my request:

val openAI = OpenAI(AppSettings.props.chatGPT.chatGPTApiSecretKey, lc, Timeout(Duration.INFINITE))

val chatCompletionRequest = ChatCompletionRequest(
    model = ModelId(AppSettings.props.chatGPT.gptModel),
    messages = listOf(
        ChatMessage(
            role = ChatRole.System,
            content = systemText
        ),
        ChatMessage(
            role = ChatRole.User,
            content = userText
        ),
        ChatMessage(
            role = ChatRole.User,
            content = "Question 1: Summary of Scores.\n" +
                    "Generate a detailed summary of the user’s scores and use \"summaryofscores\" as a node name for this question."
        ),
        ChatMessage(
            role = ChatRole.User,
            content = "Question 2: Statistical Analysis.\n" +
                    "Conduct a thorough statistical analysis of the user’s scores and use \"statisticalanalysis\" as a node name for this question.\n"
        ),
        ChatMessage(
            role = ChatRole.User,
            content = "Question 3: Score Analysis.\n" +
                    "Provide a detailed analysis of the user’s scores and use \"scoreanalysis\" as a node name for this question."
        ),
        ChatMessage(
            role = ChatRole.User,
            content = "Question 4: Interpretation of Results.\n" +
                    "Offer a comprehensive interpretation of the user’s results and use \"interpretationofresults\" as a node name for this question."
        ),
        ChatMessage(
            role = ChatRole.User,
            content = "Question 5: Evaluation of Performance.\n" +
                    "Evaluate the user’s performance in detail and use \"evaluationofperformance\" as a node name for this question."
        ),
        ChatMessage(
            role = ChatRole.User,
            content = "Question 6: Strengths.\n" +
                    "Identify the user’s strengths based on the data and use \"strengths\" as a node name for this question."
        ),
        ChatMessage(
            role = ChatRole.User,
            content = "Question 7: Areas for Improvement.\n" +
                    "Highlight specific areas where the user can improve and use \"areasofimprovement\" as a node name for this question."
        ),
        ChatMessage(
            role = ChatRole.User,
            content = "Question 8: Recommendations.\n" +
                    "Provide actionable recommendations for the user to improve their skills and use \"recommendations\" as a node name for this question."
        ),
        ChatMessage(
            role = ChatRole.User,
            content = "Question 9: Conclusion.\n" +
                    "Summarize the overall evaluation and use \"conclusion\" as a node name for this question."
        ),
        ChatMessage(
            role = ChatRole.User,
            content = "Question 10: Level Evaluation.\n" +
                    "Assess the user’s level in the subject (e.g., beginner, intermediate, advanced) and use \"level\" as a node name for this question."
        )
    )
)
completion = openAI.chatCompletion(chatCompletionRequest)
completions = openAI.chatCompletions(chatCompletionRequest)
response = completion.choices[0].message

diegopefm · July 20, 2024, 5:32pm

Ok, I found the issue, but instead of deleting the question I’ll respond myself in case it helps anyone else facing the same issue.

My problem was really silly, I was mistakenly using “gpt-3.5-turbo” model instead of “gpt-4-turbo” (the one I’m paying for).

Apparently using multiple questions in one same request does not work for 3.5 model, or at least it was not working for me, but after changed to 4-turbo it started working as expected.

Problem solved.

kevin.dragan · July 20, 2024, 7:08pm

Good to hear. I was going to suggest you’d need to go programmatically. FYI, you could look at Langchain as its literally designed for chaining together actions by feeding one process to the next, so you could chain together 10 sequential llm calls if needed beyond just a message list. Separately… try the ‘gpt-4o’ model parameter…much cheaper if it works(better in some ways, worse in others). Cheers.

Topic		Replies	Views
Making ChatGPT to show users better explanations about why the answer is wrong in my tests app API	15	1452	December 18, 2023
ChatGPT: Completions API doesn't reply as the web chat playground does? API	15	12407	December 13, 2023
ChatGPT answers partially to request API chatgpt	6	141	February 20, 2025
Multiple API calls and Steps (messages): Is this the correct way? API	4	3837	September 15, 2023
GPT 3.5 calling 4o instead? API api	4	46	April 2, 2025