How do you prevent repeat data for API?

kevinlol · November 15, 2023, 3:55pm

Hello. We are using GPT-4 API for generating a large comprehensive list of data (vocabulary terms) but it seems that a large amount of the data we receive is repeated multiple times. Is there a way to upload our existing data so that GPT won’t repeat existing terms it has already generated?

_j · November 15, 2023, 4:20pm

You can certainly do that.

In the chat format, there’s two main ways I could see:

Give the assistant’s prior replies to questions, just as if you had been using a chatbot.

user: provide vocabulary words
assistant: blah blah
user: provide more vocabulary words
assistant: menehune blalah

Amend your user input:

user:
// existing vocabulary words:
blah menehune blalah
// instruction: provide even more vocabulary words not listed
assistant: kine broke da mouf ono grinds

Such continuation without overlap (but you can’t trust ChatGPT not to forget its chat…

Better might be to define particular topical areas of data that would naturally have no overlap.

You can ask the AI to write you a de-duplicating python script just to make sure.

alvetr0 · November 15, 2023, 4:50pm

if your’re using the api you can use “frequency_penalty” and “presence_penalty”.

Both of these parameters are useful for controlling the repetitiveness and diversity of the content generated by the model. By adjusting them, you can fine-tune the output to avoid redundancy and encourage the generation of unique text, which is especially valuable in tasks like creating extensive lists, brainstorming, or writing content where variety is key.

here’s an example:

curl https://api.openai.com/v1/engines/davinci-codex/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
  "prompt": "Generate a comprehensive list of vocabulary terms related to astronomy:",
  "temperature": 0.7,
  "max_tokens": 150,
  "top_p": 1,
  "frequency_penalty": 0.5,
  "presence_penalty": 0.5
}'

_j · November 15, 2023, 4:57pm

The frequency and presence penalties are not word-banishers though, they operate on tokens, and can have unforeseen effects, like your cumulative prohibition of the quotation mark of a json.

alvetr0 · November 15, 2023, 5:07pm

That’s correct😁,

To control the results Kevin, use the playground and use both of our responses.
Try using a better prompt, past responses, python and this parameters.

Topic		Replies	Views
How to stop results from separate API calls that use the same prompt having the exact same answers or text API api	5	1706	June 29, 2023
How to avoid repeated words in response? API	6	2760	June 10, 2024
How can I prevent my chatbot from generating repetitive or redundant text in its responses when using custom data? API gpt-4 , chatgpt , api	1	91	February 7, 2025
How to stop getting replies with Ai sounded bording words? Prompting chatgpt , ai	3	541	August 27, 2024
Is it possible to force the API to exclusively use my own data or documents? API gpt-4 , chatgpt	5	980	July 17, 2024

How do you prevent repeat data for API?

Give the assistant’s prior replies to questions, just as if you had been using a chatbot.

Related topics