GPT to judge, select, and reprint items from a list?

I’m having a very difficult time getting GPT 3.5 (or 4) to do what seems like a very simple task-- analyze a list of suggestions and reprint only some of them.

My program returns a long list of suggestions on a screenplay. The suggestions are in chronological order and the total number of suggestions ranges anywhere from 5 to 40.

Then I run the list through a prompt that reads these suggestions and reprints the 10 best ones. Significantly, I want the suggestions to be evenly distributed across the list (suggestions from beginning as well as the end).

However, the prompt keeps ignoring the later suggestions in the list. When I move the same suggestion up earlier in the list, it will get chosen, but not when it appears later.

This seems simple, but I am literally unable to get the model to give equal attention across the list of suggestions.

If anyone is curious, here is a link to my prompt in the playground.

Can anyone give guidance based on previous work? Here’s my prompt in plain text as well:

You are a note transcription expert. You will be given a list of notes and you must select the reprint the best 10 notes in uniform formatting. 

Read the following script notes and reprint the best 10 scripts notes with uniform formatting. Judge the 10 best based on detail and how well they address the feedback. 

Can we make Layken more rough around the edges? We'd love to see her put up a rough, sarcastic exterior to the world that she using to hide her tender interior. 

- Read all the script notes. 
- Rank the 10 best script notes based on how well they address the FEEDBACK.
- Reprint the 10 highest ranked script notes.

Make sure to tell it to keep concise and to only use items from your list… fine tune from those parameters

Done with -16k


You are TheatreBot, a script writer and script note evaluator.


I requested that different script editors make suggestions that implement this improvement: “xxxxx”

We received too many script note recommendations, though. Goal: The AI must choose and output only ten scenes that have the best quality recommendation for implementing the improvements. However, at least one recommendation from each section must be included in the top 10.




Based on the given recommendations, here are the top 10 scenes that implement the improvement of making Layken more rough around the edges with a sarcastic exterior:

  1. Scene: Scene with Eddie and Layken
    Recommendation: Layken replies sarcastically to Eddie’s insights about life, showcasing her rough exterior and adding humor to the conversation.

  2. Scene: Scene in Mr. Cooper’s Class
    Recommendation: Layken uses sarcasm to hide her true feelings when answering Mr. Cooper’s question about poetry, further emphasizing her rough exterior.

  3. Scene: Scene with Will and Layken in the backyard
    Recommendation: Layken responds with a sarcastic remark when Will talks about understanding her mother, showing her rough exterior and adding humor to the conversation.

  4. Scene: Scene with Julia and Layken
    Recommendation: Layken responds sarcastically to Julia’s suggestion about carving pumpkins, highlighting her rough exterior and resistance to accepting her mother’s condition.

  5. Scene: LAYKEN, WILL - Page 52
    Recommendation: Layken stands up for herself with a forceful and sarcastic dialogue, emphasizing her rough edge and adding intensity to the scene.

    Recommendation: Layken talks sarcastically about her situation with Will, showcasing her rough exterior and adding a touch of cynicism.

    Recommendation: Layken reacts with anger and sarcasm when learning about her mother’s cancer, displaying her tough exterior and reluctance to show vulnerability.

  8. Scene: INT. CAFETERIA - DAY
    Recommendation: Layken demonstrates disappointment with Gavin’s plans for Eddie’s birthday with sarcastic dialogues, showcasing her rough exterior and wit.

  9. Scene: INT. SLAM CLUB - NIGHT
    Recommendation: Layken reacts with a sarcastic comment upon hearing Will’s voice at the Slam Club, further emphasizing her rough exterior and adding humor to the scene.

  10. Scene: INT. CLASSROOM
    Recommendation: Layken interjects with a sarcastic comment when Eddie asks if Will is going to read his own poetry, showcasing her rough exterior and adding a touch of cynicism.

These scenes provide a good balance of showcasing Layken’s rough exterior and incorporating sarcasm into her dialogue, while also including at least one recommendation from each section.

1 Like

There’s no problem being concise or selecting items from the list.

It just shows a strong bias for the beginning of the list that skews the results.

Thanks for the attempt! However, it still ignores all the scenes towards the end (The Dennys, The Therapists office, etc.) which we planted to be clear winners for best notes. For whatever reason, I can’t get the model to pay attention to the end.

It is indeed documented that mid-context gives poorer results of understanding, and the more the AI writes, the more the end becomes the mid.

So if you really want a technique that is unbiased by position, you can shuffle the sections and internals up a section at a time and pop them back to the bottom, and from 6 or 12 API runs (or whatever gets you back to the initial state), choose the best 10 out of the best 10s by appearance count.

Another would be to score them individually, giving the AI reference point scenes and scores as examples.

More: If you really want the AI to be a judge and you aren’t simply judging the AI, give it the original section and modifications. It can’t tell if the person is recommending exactly what’s already written without this knowledge.


I would do analysis on what the best is based on more clear statements.

Define what good means, give it a score from 1-10.

Let gpt-3.5 analyse each of them based on each criteria what good means and in the end let a piece of code calculate the order.

This way you can even analyse a ranking of unlimited sources.

I mean try for yourself:

Sort these fruits from best taste to least:

apple, banana, strawberry,…

And now try this:

I like sweet fruits. Give each fruit a score from 1-10 for sweetness.

You will still get a bad result. But not as bad as in the first fruit assessment.

And now create a list of 10 different criteria on how to measure fruits.

For that you can use gpt.

Now this is quite interesting. I hadn’t realized the end would become the mid by the time it should be outputting items from the end. I will try the shuffling technique, with a back-up of getting the script notes scored beforehand when they are initially produced. Thank you.