How do you make GPT3.5 API use specific keywords?

If you’re trying to get it to generate 13 separate articles, then do them all as separate queries.

Oh actually I meant there are 13 more pairs of instructions per 1 article. It’s essentially 1 set of instructions for each heading and content underneath, with a few others like these keyword instructions, for a total of 13-14 sets of user/assistant instructions per 1 article. EDIT: This has since been changed to just 13 user instructions instead of 13 sets of user/assistant instructions.

Also, @_j has a great point with “You might even SEO just by telling the AI it is a SEO optimizer…”, I bet that would improve the quality of the content a bunch, once you split this monster query into a bunch of smaller ones.

Unfortunately for my use case, the keywords have to be a specific list I give it because the article is then checked against a different SaaS (not mine) to ensure these keywords are present. This step is client facing, so even if the generated article is SEO optimized, if it doesn’t contain the keywords I specify, their SaaS marks it as a ‘problem’ or ‘not optimized.’ Somewhat annoying since they’re SEO optimized either way I know, but such is the request I’ve been given :smiling_face_with_tear:

I do have an upcoming project that I’ll be using your suggestions for, however. Much easier when there won’t be clients involved in the process.

I’m curious about your usage of user/assistant instruction pairs over system message. Do you mean that you’re providing the assistant response, or are you describing a chat history where the assistant answer is fed back 13-14 times as the answer is refined?

1 Like

If you list all of your instructions and a set of actual keywords I can help you with your issue.

1 Like

They arent mutually exclusive, you can tell it both to be an SEO master and also explain+command it to use exactly those keywords. Something like starting the overall prompt with “youre an SEO guru and i need your help.” The explain the general trend of the goal, but dont give explicit instructions yet. Next explain something like, “To do this I have a set of keywords, [insert list here], and i need them to all be present in the input. Please include them organically and in an SEO optimized way.” Then you explain the details (your other rules), and at the end you add your imperatives (ie your commands, voiced as orders not requests), like “The output must include exactly the list of SEO keywords i gave you, which was [insert list again], and they must appear organically as if they just came to your mind as you were writing.”

Ill revisit this thread this evening and try to provide a concrete example. It would help to have some example of your other 13 constraints, or things like them. Also, thats really too many instructions to expect a good one-shot article to result, you should find ways to combine or eliminate some of them. Pm me if you want help brainstorming how to do that but dont want to post them publicly.

Here is some code I used to specify 21 tokens as the only available tokens to be returned (bias value of 100s across the board). I’m not experimented too much with logit_bias, but here is some starter code to get the token ids.

import tiktoken

encoding = tiktoken.encoding_for_model("ada")

def get_token_id(word):
    token_ids = encoding.encode(word)
    return token_ids[0] if token_ids else None

words = [
    '0', ' 0', '0 ',
    '1', ' 1', '1 ',
    '2', ' 2', '2 ',
    '3', ' 3', '3 ',
    '4', ' 4', '4 ',
    '5', ' 5', '5 ',
    '6', ' 6', '6 '
]
bias = {str(get_token_id(word)): 100 for word in words}
2 Likes

You didn’t read more than the headline. The point here is that the contents need to remain. However, it is just fine if they are rewritten by an AI, have awkward words or phrases inserted, don’t sound natural or preserve the original intent - the only purpose is to spam up the internet.

From earlier in the thread:

This is one of the options I’ve tried, but it didn’t seem to work either. After a little more research, I think I have to use logit_bias for this, but I’m not sure how to programmatically tokenize keyword list inputs.

Should have replied to that directly. My use case was different from what is trying to be accomplished in the thread.

Im torn between “id rather not have my brain assaulted by terribly written everything, at least make it read well” and “if its all obviously ai written i can mentally filter it out and ignore it easier”.

Sure, let me post a little more for clarity. I can’t post the complete code or specifics of what it’s generating, but here’s what I’m doing.

Here is the web app side where I input the article’s contents in a form.

main.py

import chat_functions
from flask import Flask, render_template, request

app = Flask(__name__)

@app.route("/", methods=["GET", "POST"])
def generate_article():
    if request.method == 'GET':
        return render_template('index.html')

    title = request.form.get('title').title()
    materials = request.form.get('materials')
    instructions = request.form.get('instructions')
    keywords = request.form.get('keywords')

    content = chat_functions.ArticleGenerator(title, materials, instructions, keywords)
    article = content.generate_article()

    return render_template('index.html', article=article)

if __name__ == "__main__":
    app.run(host="127.0.0.1", port=8080, debug=False)

The chat_functions being imported in main.py above are my GPT instructions, which look like the following (please excuse the text formatting). Small side note, I changed the 13 pairs to just 13 user instructions, I’ll edit my comment above.

chat_functions.py

import openai
import os
from dotenv import load_dotenv

class ArticleGenerator():
    def __init__(self, title, materials, instructions, keywords):
        self.title = title
        self.materials = materials
        self.instructions = instructions
        self.keywords = keywords
        load_dotenv()
        openai.api_key =  os.getenv('OPENAI_API_KEY', 'xxxxxxx')
        self.messages = [
        {"role": "system", "content": f'''Generate articles that follow the users instructions exactly.'''},
        {"role": "user", "content": f'''I want you to write an article for a {title} project according to my instructions and formatting rules.'''},
        {"role": "user", "content": f'''First, write {title} as a first level heading. Under that heading, write an introduction to the project.'''},
        {"role": "user", "content": f'''Next, write 'What is {title}?' as a second level heading. Under that heading, write about what {title} is.'''},
        {"role": "user", "content": f'''Next, write 'What You Need to Make {title} At Home' as a second level heading. Under that heading, take this list of materials
                                            and write about why each material is needed for the project: {materials}. Bold the name of the material, then write
                                            no more than 40 words about why it is needed. Seperate each material and its description with a line break. 
                                            Do not include the material's quantity.'''},
        {"role": "user", "content": f'''Next, write 'Tips for Making the Best {title}' as a second level heading. Under that heading, include 5-7 points about how to build the best {title}.'''}
        ]

    def generate_article(self):
        completion = openai.ChatCompletion.create(
            model="gpt-4",
            temperature=1.2,
            presence_penalty=0.0,
            frequency_penalty=0.0,
            logit_bias={48126: -100},
            messages=self.messages
            )
        return completion.choices[0].message.content

These aren’t the complete instructions, but this is basically how I’ve got it working now. Each instruction creates an H1, H2, or H3, then writes and formats the content for each heading. This part is working great; it writes relevant content that is formatted how I specify. The issue is, while it’s writing these sections of content, I need it to use specific keywords, which I can’t get it to do.

These keywords are always relevant to the topic of the article, I just need the API to actually use them. For example, say the article title is ‘Painted DIY Bookshelves’ and a keyword is ‘coats’ (as in coats of paint). The API could either write a new sentence using the word (i.e., ‘Apply two coats of paint, letting it dry in between’), or it could replace an existing word with the keyword when it can be used interchangeably (i.e., using ‘coats’ instead of ‘layers’). It’s doing neither currently.

@chrstfer I’m going to completely redo the chat instructions following your recommendations today, so I’ll update or send a PM to take you up on your brainstorming offer. Thanks!

@_j No offense, but you are not understanding the point of this thread and it’s convoluting the conversation. What you’re talking about is keyword stuffing, which Google has been penalizing for years. What I’m talking about is keyword optimizing, which is still very much a ranking factor. Having this program write ‘fridge’ instead of ‘refrigerator’, or create new sentences using keywords that are relevant to the topic, is neither spam nor too much to ask from ChatGPT. Every website anywhere near the top 5 pages of Google SERP’s use some variation of keyword optimization to help get there, which is why I’m not here to debate SEO strategy.

@ethan.peck I appreciate the example! If I can’t get this to work with @chrstfer suggestions, I’ll try this out with the logit_bias.

1 Like

So here’s what I tried:

{"role": "system", "content": f'''You are an SEO specialist and I need your help writing articles. These need to be well written articles about [blank] that are SEO optimized.'''},
{"role": "user", "content": f'''To do this, I have a set of keywords and phrases, {keywords}, that need to be included in the article. Please include them organically and in an SEO optimized way..'''},

# Examples of these instructions can be found in my comment above
{"role": "user", "content": '''Article instructions here using 10 separate user instructions'''}, 

{"role": "user", "content": f'''“The output must include exactly the list of SEO keywords and phrases I gave you, which was {keywords}. They must appear organically as if they just came to your mind as you were writing.”'''},
{"role": "user", "content": f'''Write this [blank] article according to my instructions.'''},

I also tried combining the first user instruction with the system instruction, but neither worked. I generated 3 articles with these instructions, then took the {keywords} instructions out and generated them again, and the results were pretty identical.

To be clear, it’s writing a very nice article with other relevant keywords, just not the ones I specify.

@ethan.peck I tried your code above and couldn’t get it to work. Here’s my implementation, maybe I did something incorrectly.

import tiktoken
import openai

keywords = request.form.get('keywords')

encoding = tiktoken.encoding_for_model("gpt-4")

def get_token_id(word):
    token_ids = encoding.encode(word)
    return token_ids[0] if token_ids else None

    words = keywords
    bias = {str(get_token_id(word)): 100 for word in words}

class ArticleGenerator():
    def __init__(self, bias):
    self.bias = bias
    self.messages= #Chat instructions here

def generate_article(self, bias):
    completion = openai.ChatCompletion.create(
        model="gpt-4",
        temperature=1.2,
        presence_penalty=0.0,
        frequency_penalty=0.5,
        logit_bias={bias: 100},
        messages=self.messages
        )
    return completion.choices[0].message.content

content = chat_functions.ArticleGenerator(bias)
article = content.generate_article()

EDIT: It looks like logit_bias isn’t working at all actually. I ran this a second time to exclude just the token for the word ‘typically’ like this

logit_bias={48126: -100},

but the generated content still included ‘typically’ 3 times.

This is a straightforward task. Not sure why you’re facing this issue. Here’s a demo in Playground

You can combine this with @elmstedt 's suggestion and get successful completions with the desired keywords.

2 Likes

Yes this works if you use a single instruction to generate the entire article as in your example, but it’s not working when I have multiple instructions. Maybe I’m not using the correct method to achieve what I’m trying to do; if I have ~10 headings in the article and the content under each heading needs its own instructions and formatting, and these keywords need to be used throughout, what’s the best way to do that?

I’ve tried every variation of @elmstedt 's suggestions I can think of and more, it’s just not using the words. In fact, it even ignores set word counts and formatting instructions intermittently, which should be even more straightforward.

I had a quick skim through this thread, and I can’t seem to find the answer to a question I have, how many keywords are in the list you give it?

If it’s more than 5-6, it’s almost never going to work. So I just wondered are you giving it 20 keywords? or just 3 or 4?

1 Like

I made it work to extract skills from commit files and it works 100% correct. Checked thousands of commits and extractions manually and did not find a single wrong skill.

But that’s all I am telling. It is possible. I won’t tell how. Went through many weeks of sleepless nights for that.

If it’s more than 5-6, it’s almost never going to work.

Ah, yeah that might be the issue then, the lists are 20+ keywords

@jochenschultz I’m afraid I don’t know what you’re referring to, skills?

Google doesn’t really go for keyword match anymore. It changed a few month ago. At least that’s the essence of what they said.

What you are looking for is the meaning of the website in a few words.

It should be possible to write content about money without using the word money at all and still rank on the word money.

Try with a website about fruit. Don’t use the word fruit.

Ah yeah, quality content over keywords and all that. Like I said above, that would be fine for future projects, but this one comes with client expectations where the keywords match their SaaS so the article gets a “good grade.”

You and I know there are different/more important ranking factors, but this use case has an additional requirement along with ranking.

1 Like

And you have no idea how to do that? How did you get the customer?

Well you could potentially break the command into 4 passes with 5 skills being looked for in each one, then append the results. But keeping the attention narrowed down to just a few key things is key, GPT-4 is better with more, but it’s still not limitless.

If you have an example list of your twenty keywords I would be happy to give it a go. Alternately, I can try with a semi-random keyword list but I think it would be more useful for everyone if you had some exemplar we could all play with.

I would genuinely love to take a crack at it. I have a few more ideas about how one might be able to cajole the model into behaving, but I’d like to do a proper test before sharing to avoid muddying the waters.

Word counts it really cannot do and it cannot really be expected to do.

  1. Because it has a tenuous (at best) grasp of the notions of words and numbers.
  2. It “thinks” in tokens not words or characters so it will always find hitting exact word/sentence/paragraph/response lengths impossible.
  3. Hitting those exact counts requires one or more of,
    a. forethought
    b. back-tracking
    c. within-response self-reflection
  4. It’s a chat-based model, fine-tuned by design to emulate natural human communication—not an instruct-based model geared towards rigidly following commands.

Formatting instructions will also occasionally fall by the wayside if the model happens to pick the wrong token at an inopportune moment.

By way of a completely manufactured example,

Example

Say you ask for a numbered list of animals you might find on a farm. The model might begin it’s response,

Sure! Here is a list of common farm animals:

Then, the model has a choice, maybe the top few next tokens are,

16
5736
7058
91015

Corresponding to,

1
*
-
Cow

Because of the existing context from the user prompt, token 16 will likely be selected 99.99% of the time. But, there is always a non-zero probability one of the other tokens will be selected (unless temperature is set to 0 in which case only the most probable next token is ever selected).

Then, having selected that token, the rest of the response will almost certainly follow whatever format it has started down. E.g. if the model starts with,

* Cow\n

It would be very unlikely to continue to,

* Cow\n2. Chicken\n

because strangely formatted lists like that are likely to appear in the training set in vanishingly small proportions.

That said, most formatting issues can be eliminated with the use of one-shot to few-shot examples in the prompt.

Incidentally, if you can establish some set of minimally not-working examples, I think you have the basis for a very good eval you could submit.

Eval idea

Generalize it to just writing a text passage about X which must incorporate a set of n words; where n varies from, say, 3 to 30.

It’s incredibly easy to test against with simple regex.

2 Likes