Text Generation: I'm struggling with Output Consistency

Hi,

I’m having a super difficult time getting the 3.5-turbo API to return the correct number of output lines consistently. (it can do it about 40% of the time)

I’ll attach my python below. The input “MTD.txt” is a sampling of 6 song lyrics separated by newlines, and it currently writes the output to the file “cherry_coke8.txt” in the same directory.

I’m trying to do this with one system prompt, and multiple user prompts (I’d rather not use the giant block of instructions during each request because of the input token cost).

It can sometimes output descriptions, but it’s extremely flaky.

Any advice is very appreciated!

from openai import OpenAI

# Reads the clean txt file
def readCleanTXT(cleanTXT):
    with open(cleanTXT, 'r') as file:
        lines = file.readlines()
    return lines
    

# Runs the OpenAI API on the clean txt file
def generatePrompts(cleanTXT, wordPhrase1, wordPhrase2, wordPhrase3):
  client = OpenAI()


  systemMessageWhole = {"role": "system", "content":
      "You are a creative prompt creator." + "\n" + \
      "The user will pass you lines of a song, one at a time, and you will generate creative prompts for each line (FOR EACH USER REQUEST)." + "\n" + \
      "Then, the images produced from your prompts will be shown concurrently with the song’s audio, creating a slideshow-like music video." + "\n" + \
      "Note that the image generation model takes in each prompt separately, so that you cannot reference past prompts or scenes from past prompts, and instead must describe them again." + "\n" + \
      "Yet, your prompts must still show an overarching theme / arc / story, and relate to each other in content." + "\n" + \
      "The user will pass in three words or phrases along with the lines to their song, which will guide your prompts in terms of “mood” or “theme”." + "\n" + \
    """Criterion:
      Respond to each line individually with a single sentence. Prompts are not necessarily in full sentences, and may contain many adjectives and dependent clauses separated by commas. Maximimze Detail.
      Do not combine responses or directly refer to past lines.
      Use clear language that focuses on visual imagery.
      ***The user will input lyrics 1 by 1. Make sure that you are returning ALL your prompts in your completion, and not just one.***
      DO NOT USE INTERPRATIVE OR ABSTRACT VERBS: Do not analyze. Do not use any words "representing" “showcasing” or “emphasizing” or “highlighting” or "portraying" or similar. Simply state what is literally in the image. Don't infer meaning. Just describe clear, vivid pictures.
      Adhere strictly to the mood words provided. These should influence your prompts.
      Keep your response to a single sentence, as descriptive as possible while under 20 words.
      A sample of a good prompt: A powerful mysterious sorceress, casting lightning magic, detailed clothing, digital painting, hyperrealistic, fantasy, Surrealist, full body, highly detailed, dystopian, iridescent gold""" + "\n" "\n" + \
      "These are the user's three words/phrases: " + "\n" + \
      wordPhrase1 + ", " + wordPhrase2 + ", " + wordPhrase3}
  
  lineTotal = 6

  userMessages = [
      { "role": "user",
          "content": f"Line {index + 1} of {lineTotal}: {line}"
          } for index, line in enumerate(readLyrics)
  ]

  print([systemMessageWhole]+userMessages)

  completion = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=
      [systemMessageWhole] + userMessages
)
  
  # Writes the finished file to local directory as .txt
  with open("cherry_coke8.txt", 'w') as file:
      #print(completion.choices[0].message.content)
      file.write(completion.choices[0].message.content)


if __name__ == "__main__":

  readLyrics = readCleanTXT("MTD.txt")

  wordPhrase1 = input("Enter word/phrase 1: ")
  wordPhrase2 = input("Enter word/phrase 2: ")
  wordPhrase3 = input("Enter word/phrase 3: ")
  

  # The actual call
  generatePrompts(readLyrics, wordPhrase1, wordPhrase2, wordPhrase3)

lineTotal is currently 6 because I’m testing this with a 6-line song sample, but hope to be able to apply this python file to any .txt file in the future!

Let’s make your system message understandable to others, and hit some misspellings on the way:

systemMessageWhole = {
“role”: “system”,
“content”: f"""You are a creative prompt creator.
The user will pass you lines of a song, one at a time, and you will generate creative prompts for each line (FOR EACH USER REQUEST).
Then, the images produced from your prompts will be shown concurrently with the song’s audio, creating a slideshow-like music video.
Note that the image generation model takes in each prompt separately, so that you cannot reference past prompts or scenes from past prompts, and instead must describe them again.
Yet, your prompts must still show an overarching theme / arc / story, and relate to each other in content.
The user will pass in three words or phrases along with the lines to their song, which will guide your prompts in terms of “mood” or “theme”.
Criterion:
- Respond to each line individually with a single sentence. Prompts are not necessarily in full sentences, and may contain many adjectives and dependent clauses separated by commas. Maximize detail.
- Do not combine responses or directly refer to past lines.
- Use clear language that focuses on visual imagery.
- The user will input lyrics 1 by 1. Make sure that you are returning ALL your prompts in your completion, and not just one.
- DO NOT USE INTERPRETIVE OR ABSTRACT VERBS: Do not analyze. Do not use any words “representing” “showcasing” or “emphasizing” or “highlighting” or “portraying” or similar. Simply state what is literally in the image. Don’t infer meaning. Just describe clear, vivid pictures.
- Adhere strictly to the mood words provided. These should influence your prompts.
- Keep your response to a single sentence, as descriptive as possible while under 20 words.
- A sample of a good prompt: A powerful mysterious sorceress, casting lightning magic, detailed clothing, digital painting, hyper-realistic, fantasy, Surrealist, full body, highly detailed, dystopian, iridescent gold

These are the user’s three words/phrases:
{wordPhrase1}, {wordPhrase2}, {wordPhrase3}“”"
}

A way you can get the AI to understand what it has generated in length so far is to ask for the output in a numbered list, compliant to markdown format.

If you don’t like the numbers, you can then programmatically or with AI remove them.

This seems very convoluted. You likely want to specify that the entire “poem” generated up to that point must be output again, and only after that output is the new text appended.

In a multi-turn conversation, with a growing conversation history that must be passed back to the AI, you are not saving tokens by making multiple calls, instead you are inflating the token usage.

1 Like

Thanks. I wasn’t aware the entire conversation history would be relayed per request.

How would you recommend structuring the requests? I fundamentally just want to be able to input a .txt file, and have it output a .txt file with an identical number of lines, where each line is rewritten to be a prompt for an image generation model.

Well in your case, it wouldn’t cost more, because it seems you aren’t giving a chatbot its conversation history, you are just making individual calls:

    messages=
      [systemMessageWhole] + userMessages

Here’s a little demo of a chatbot loop, providing some turns of conversation history to talk to an AI as the prompt seems to describe.

messages = system + chat[-10:] + user

I’m not sure that’s what I’m looking for. I’m intending for this python script to be mostly backend.

In the frontend, the user should just provide a .txt file (and three words / phrases to guide the tone of the descriptions generated) of lyrics, and I want this script to come up with creative lines for each lyrical line (eg “Baby you’re a firework” → “A dazzling display of fireworks over a pastel sky, watercolor, photorealistic, vibrant hues”.

… And I want it to output its response as a .txt file with the same number lines without any other user input.

I’m wondering if you have any ideas on how to do this well and consistently with as few input tokens as possible?

I think I see where you’re going. Unlike the interactive user scenario you paint in your prompt, you just want each line turned into a more colorful depiction for an AI image creator, but to have all processed at once.

You probably don’t want to refer to “the user”, as like I said, that will get the bot chatting.

A good way to keep the AI on track in a single job then might be to specify a JSON output format. One that includes the original line, and then the colorful imagery line. Let’s see what I do without going over the top with distracting words.

System:
You are a image creator, with purpose to take text of a poem or song lyrics, and line-by-line, transform the scenario of the poem at that point into text that describes a vibrant and colorful depiction of the story at that line. Each line you write will be a prompt used by an AI image creator to make a slideshow that accompanies the poem or song. Pictures shall be independent and completely described, but using similar artistic style, as the AI image creator has no memory. You only robustly describe visual elements that create a non-moving static picture, not a script nor actions. You are an automated backend and there is no user to communicate with. Output will be valid JSON with keys “line#”, “output#”, for the line numbers.

// Example input like that from the user message
The outlook wasn’t brilliant for the Mudville nine that day
The score stood four to two, with but one inning more to play

// Example output shall be your only response
{
“line1”:“The outlook wasn’t brilliant for the Mudville nine that day”,
“output1:”, “The scene intensifies as the game hangs in the balance. The scoreboard looms overhead, the numbers illuminated against the fading daylight. The field is a vibrant mosaic of green, with players poised in anticipation, shadows elongated as the sun begins its descent”
},
{…

User input:

Once upon a midnight dreary, while I pondered, weak and weary,
Over many a quaint and curious volume of forgotten lore—
While I nodded, nearly napping, suddenly there came a tapping,
As of some one gently rapping, rapping at my chamber door

gpt-3.5-turbo output:

{
“line1”: “Once upon a midnight dreary, while I pondered, weak and weary”,
“output1”: “The scene is set in a dimly lit study, with shelves upon shelves of old books lining the walls. The protagonist sits at a desk, hunched over, deep in thought. The room is filled with an air of melancholy and exhaustion.”,
“line2”: “Over many a quaint and curious volume of forgotten lore—”,
“output2”: “The camera zooms in on the books, revealing their intricate designs and weathered pages. Each book tells a story of its own, with faded titles and delicate illustrations. The room is filled with a sense of mystery and intrigue.”,
“line3”: “While I nodded, nearly napping, suddenly there came a tapping”,
“output3”: “The protagonist’s eyes grow heavy as they struggle to stay awake. The room becomes hazy and dreamlike. Suddenly, a sound breaks the silence—a gentle tapping that echoes through the room.”,
“line4”: “As of some one gently rapping, rapping at my chamber door”,
“output4”: "The camera pans towards the chamber door, revealing its ornate design and intricate car…

Placing the input line right before the text to be generated gives no leeway and better following along.

gpt-3.5-turbo seems to think it is directing a music video or writing a script, hard to discourage. Better example may give better performance. You can pay for GPT-4 to do it better.

1 Like

Thank you for the help. Seriously. I’ve been working on this for the last couple of days and it’s been endlessly frustrating.

With the revised message below, I get about a 2 in 3 success rate, with the 1 in 3 being gpt3 just outputting line1 with the output for line1, or outputting a much lower number of lines.

Is this just the nature of interacting with LLMs and something I have to accept? Should I just make a condition that checks whether the output file is long enough, and if not, runs the api completion again?

Your purpose:
You will be given the lyrics to a song. For each line in this song’s lyrics, generate a prompt for an image generation model. Your prompts should be visually descriptive yet concise (under 25 words each). Describe each image fundamentally, and elaborate on the additional modifiers (style, colors, mood words, etc.) The image generation model has no memory, so you may not reference past prompts. Create aesthetic, visually-striking images with a singular subject.

Avoid abstract verbs altogether: DO NOT USE "representing" or "showcasing" or "portraying" or "symbolizing" or anything like them. You simply describe what is literally in the image. Robustly describe visual elements that create a non-moving static picture, not a script nor actions. You are an automated backend and there is no user to communicate with. Output will be valid JSON with keys “line#”, “output#”, for the line numbers.

// Example input line
Line 1 of 7: Cause baby, you’re a firework

{
// Example output should be your only response
“line1”: “Cause baby, you’re a firework”
“output1”: “Pastel firework explosions in the night sky, Fourth of July, celebratory, hyperrealistic, detailed”
},

Or is this something that is worth workshopping to get perfect?

Another thing to note is that the generated prompts get drastically worse as the number of input lines scales. I might do some sort of batching to combat this, but let me know what you think.

Here’s the full code for reference:

from openai import OpenAI

# Reads the clean txt file
def readCleanTXT(cleanTXT):
    with open(cleanTXT, 'r') as file:
        lines = file.readlines()
    return lines
    

# Runs the OpenAI API on the clean txt file
def generatePrompts(cleanTXT, wordPhrase1, wordPhrase2, wordPhrase3):
  client = OpenAI()


  systemMessageWhole = {"role": "system", "content":

"""
Your purpose:
You will be given the lyrics to a song. For each line in this song’s lyrics, generate a prompt for an image generation model. Your prompts should be visually descriptive yet concise (under 25 words each). Describe each image fundamentally, and elaborate on the additional modifiers (style, colors, mood words, etc.) The image generation model has no memory, so you may not reference past prompts. Create aesthetic, visually-striking images with a singular subject.

Avoid abstract verbs altogether: DO NOT USE "representing" or "showcasing" or "portraying" or "symbolizing" or anything like them. You simply describe what is literally in the image. Robustly describe visual elements that create a non-moving static picture, not a script nor actions. You are an automated backend and there is no user to communicate with. Output will be valid JSON with keys “line#”, “output#”, for the line numbers.

// Example input line
Line 1 of 7: Cause baby, you’re a firework

{
// Example output should be your only response
“line1”: “Cause baby, you’re a firework”
“output1”: “Pastel firework explosions in the night sky, Fourth of July, celebratory, hyperrealistic, detailed”
},
"""  


 }
  

  userMessages = [
      { "role": "system",
          "content": f"Line {index + 1} of {len(cleanTXT)}: {line}"
          } for index, line in enumerate(readLyrics)
          
  ]

  print([systemMessageWhole]+userMessages)

  completion = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=
      [systemMessageWhole] + userMessages
)
  
  # Writes the finished file to local directory as .txt
  with open("amber.txt", 'w') as file:
      #print(completion.choices[0].message.content)
      file.write(completion.choices[0].message.content)


if __name__ == "__main__":

  readLyrics = readCleanTXT("pasta.txt")

  wordPhrase1 = input("Enter word/phrase 1: ")
  wordPhrase2 = input("Enter word/phrase 2: ")
  wordPhrase3 = input("Enter word/phrase 3: ")
  

  # The actual call
  generatePrompts(readLyrics, wordPhrase1, wordPhrase2, wordPhrase3)
  

You put the output example heading within the JSON curly, and it is missing the comma between additional key/value pairs…

The example input/output doesn’t show you inputting more than one line, and there is nothing to give the impression that the AI will continue for all lines. Line # of # I don’t think is productive, the AI will track numbers in its output instead.