How to avoid repeated words in response?

Hi,
I’m writing a simple script to augment my data with relevant keywords. The data is a simple collection of Emojis, and I want to add keywords to each one of them.

I’m using the API and gpt-3.5-turbo.

The issue is that, more often than not, GPT is returning various repeated keywords at the end of their response.

Here’s my code and prompt:

const completion = await openai.createChatCompletion({
      model: "gpt-3.5-turbo",
      messages: [
        { role: "system", content: "You are a helpful assistant." },
        {
          role: "user",
          content: `
          You are a helpful and creative assistant.
          Write a comprehensive list of keywords for an emoji.
          Include keywords that represent places, verbs and actions.

          Emoji:
          - Emoji: ${emoji.symbol}, // the emoji itself
          - Name: ${emoji.name},  // the name of the emoji
          - Existing keywords: ${emoji.keywords}. // some existing keywords
          
          - Requirement:
          All keywords must be single words only; avoid using sentences or expressions.
          Never repeat keywords to ensure variety in the list.
          
          - Output:
          Generate an extensive list without repeating words or compromising relevance.
          Return only a comma separated list of keywords.`,
        },
      ],
      temperature: 0.4,
      max_tokens: 250,
    });

And here’s an example response for the emoji :heart:

love, affection, passion, emotion, romance, adoration, care, devotion, fondness, attachment, desire, warmth, tenderness, infatuation, sentiment, feeling, heartbeat, pulse, beloved, sweetheart, crush, admiration, enchantment, enchanting, captivating, endearing, charming, delightful, attractive, beautiful, lovely, pretty, stunning, gorgeous, striking, captivating, alluring, mesmerizing, bewitching, spellbinding, irresistible, attractive, attractive, desirable, desirable, attractive, attractive, desirable, desirable, attractive, attractive, desirable, attractive, attractive, desirable, attractive, attractive, desirable, attractive, attractive, desirable, attractive, attractive, desirable, attractive, attractive, desirable, attractive, attractive, desirable, attractive, attractive, desirable, attractive, attractive, desirable, attractive, attractive, desirable, attractive, attractive, desirable, attractive, attractive, desirable, attractive, attractive, desirable, attractive, attractive, desirable, attractive, attractive, desirable, attractive, attractive, desirable, attractive, attractive, desirable, attractive, attractive, desirable, attractive, attractive, desirable, attractive, attractive, desirable, attractive, attractive, desirable, attractive, attractive, desirable, attractive, attractive, desirable, attractive

It’s worth noting that sometimes the prompt works great and I get a list of unique keywords. But often, I get results like the above.

I’ve tried many variations of the same prompt without much success.

Is there anything I can try to avoid the repeated keywords?

Welcome to the forum!

What happens if you set that as the system prompt, rather than the user?

Thank you @Foxabilo!
I will give it a try. I read somewhere ChatGTP usually ignores the system prompt. But I’ll try.

No luck. I still got repeated keywords.

For example: :framed_picture:

art, frame, museum, painting, picture, wall, decor, gallery, exhibition, display, hanging, masterpiece, portrait, landscape, still life, artwork, canvas, creative, visual, composition, frameless, framed, photography, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot, snapshot

It feels like the model repeats words to fill up the max_tokens parameter as much as possible. But if I leave it blank, the call runs forever as the model keeps thinking of hundreds of keywords.

You could try using the frequency penalty parameter during your API call. This parameter is basically gives a less probability of generation to the token which have already been generated before.

Might not be as straightforward as giving it a high value from the get go, but a bit of playing with the value might sort out your issue in this case

Thanks @udm17! The frequency penalty did the trick.
I could go as low as 0.1 or 0.2 and it reduced tremendously the repeated words.
I wrote a simple JS function to make sure there’s no repeated keywords in the final data, but the frequency penalty was a great shout.