making the AI say words it had no reason to interject is a challenge

Isn’t this the point of logit_bias though? Also, it does have a reason to interject them in most cases, i.e., ‘fridge’ instead of ‘refrigerator’.

I don’t believe this is something I can instruct. The keywords change every article and they have to be specific for SEO purposes.

With the API, the instructions can change every article and API call, the particulars of operation can be by injected text, and you can have a wall of behavior spelled out to get that list of new words in the new article.

You might even SEO just by telling the AI it is a SEO optimizer and have it extract web metadata and enhance the value of the first paragraphs. SEO is not simply spamming Google with irrelevant words. Search indexers also have intelligence.

1 Like

I know what SEO is, hence why I’m working on a program that writes SEO optimized articles. The keywords come from research done prior to article generation, and they must be these specific keywords, not whatever the API feels like spitting out.

the particulars of operation can be by injected text, and you can have a wall of behavior spelled out to get that list of new words in the new article.

Yes, this is what I’ve tested 20 different ways, including your suggestions above, and it’s not working. Whether it be a system message, user message, telling it to rewrite the article and inject keywords, etc., it seems to be ignoring that instruction completely, or at least not knowing what to do with it.

Yeah, thats too much for it. Pare them down. If you’re trying to get it to generate 13 separate articles, then do them all as separate queries. If they need to be related, you can do ask it to summarize between each call. So like, do article 1, then as a separate call to the API, “summarize this article. Be concise but ensure key details are included. Shorten it significantly.”; then do article 2 and include the summary as "summary of article series so far: " in your system prompt. After you get the second, you send over that summary and the new article for summarization. Wash rinse repeat. This might also save you money because you can do the summarization with a cheaper model.

Also, @_j has a great point with “You might even SEO just by telling the AI it is a SEO optimizer…”, I bet that would improve the quality of the content a bunch, once you split this monster query into a bunch of smaller ones.

~

You need to separate the logic from the keywords, not just in the code but in your mind. The keywords are interchangable and incidental, they are just a bit of fuel to get an article to come out of the bot the way you want.

You might have more luck by giving it more keywords than you want to appear and letting the bot pick them. Also, if you tell it its an SEO guru it might start swapping words out if it “thinks” they would fit better. I wouldnt immediately discard that, check with your favorite keyword software/site to see if they actually are. Sometimes its surprisingly good at picking paths I dont see, and while I havent done SEO with it, I really do bet it’s be good at it.

1 Like

Without knowing your keywords, I can’t comment on the best solution, but my advice would be to just do it in steps:

  1. First, tell the model to generate a list of sentences on the topic, one per keyword. This is very easy for GPT in general.

  2. Next, ask the model to use all of those sentences in an article on the topic. Here, I get GPT to use the following keywords: (“breakfast cereal”, “basketball”, “flashlight”, “pajamas”, “furniture”) in an article about graphics cards. It’s a pretty bad article. You can probably improve easily by providing some system message style direction on style, tone, intended audience, etc.

https://chat.openai.com/share/bd4f5948-53f9-428a-a2ac-27b9191f3638

This works because the first task is something that isn’t hard. And, once you have those sentences made, the model has a path through it’s weights and probabilities for tying the intended topic to the keyword every time because it already has the sentences. It can even bend them to flow more naturally on the second generation.

Trying to do the generation all in one go is going to be a task that LLMs are specifically ill-suited to achieve because it forces the generation to keep shifting and jumping to different probability spaces in a way that nothing in it’s training data gives it a roadmap to follow. So, just give it the roadmap ahead of time (that it gives to itself!), and it’ll find it’s own way on the second go around.

1 Like

Why? Maybe it was just what my chatgpt defaulted to and you used gpt3, but if not well… cGPT4 is perfectly capable of generating an acceptable, even quite good, article. This kind of thing, having it generate sentences without knowing that they’re going to be in a single article and then trying to get it to shoehorn them into an article leads to what you got, a weirdly disjointed mishmash of paragraphs with keywords sort of stuffed in there? Like, who thinks of “comfort” with a graphics card?

Not trying to be a dick, but if youve got GPT4 this is both unnecessary and detrimental imo. If you want to do this method, generating sentences beforehand, you need a lot of different keywords, then bucket related keywords together and determine a topic beforehand. Iteratively, so it doesnt just list the keywords together either.

Check out this code interpreter completion of my original example. It could 100% be cleaned up to improve it for unguided automation, but i also wanted to give some examples of ways to get the bot to be an effective writer: AI Advancements: GPT-Nvidia Synergy

Compare that with this, which was built off of the link you sent: GPU Insights Unveiled which took longer and was more convoluted.

If you’re trying to get it to generate 13 separate articles, then do them all as separate queries.

Oh actually I meant there are 13 more pairs of instructions per 1 article. It’s essentially 1 set of instructions for each heading and content underneath, with a few others like these keyword instructions, for a total of 13-14 sets of user/assistant instructions per 1 article. EDIT: This has since been changed to just 13 user instructions instead of 13 sets of user/assistant instructions.

Also, @_j has a great point with “You might even SEO just by telling the AI it is a SEO optimizer…”, I bet that would improve the quality of the content a bunch, once you split this monster query into a bunch of smaller ones.

Unfortunately for my use case, the keywords have to be a specific list I give it because the article is then checked against a different SaaS (not mine) to ensure these keywords are present. This step is client facing, so even if the generated article is SEO optimized, if it doesn’t contain the keywords I specify, their SaaS marks it as a ‘problem’ or ‘not optimized.’ Somewhat annoying since they’re SEO optimized either way I know, but such is the request I’ve been given :smiling_face_with_tear:

I do have an upcoming project that I’ll be using your suggestions for, however. Much easier when there won’t be clients involved in the process.

I’m curious about your usage of user/assistant instruction pairs over system message. Do you mean that you’re providing the assistant response, or are you describing a chat history where the assistant answer is fed back 13-14 times as the answer is refined?

1 Like

If you list all of your instructions and a set of actual keywords I can help you with your issue.

1 Like

They arent mutually exclusive, you can tell it both to be an SEO master and also explain+command it to use exactly those keywords. Something like starting the overall prompt with “youre an SEO guru and i need your help.” The explain the general trend of the goal, but dont give explicit instructions yet. Next explain something like, “To do this I have a set of keywords, [insert list here], and i need them to all be present in the input. Please include them organically and in an SEO optimized way.” Then you explain the details (your other rules), and at the end you add your imperatives (ie your commands, voiced as orders not requests), like “The output must include exactly the list of SEO keywords i gave you, which was [insert list again], and they must appear organically as if they just came to your mind as you were writing.”

Ill revisit this thread this evening and try to provide a concrete example. It would help to have some example of your other 13 constraints, or things like them. Also, thats really too many instructions to expect a good one-shot article to result, you should find ways to combine or eliminate some of them. Pm me if you want help brainstorming how to do that but dont want to post them publicly.

Here is some code I used to specify 21 tokens as the only available tokens to be returned (bias value of 100s across the board). I’m not experimented too much with logit_bias, but here is some starter code to get the token ids.

import tiktoken

encoding = tiktoken.encoding_for_model("ada")

def get_token_id(word):
    token_ids = encoding.encode(word)
    return token_ids[0] if token_ids else None

words = [
    '0', ' 0', '0 ',
    '1', ' 1', '1 ',
    '2', ' 2', '2 ',
    '3', ' 3', '3 ',
    '4', ' 4', '4 ',
    '5', ' 5', '5 ',
    '6', ' 6', '6 '
]
bias = {str(get_token_id(word)): 100 for word in words}
2 Likes

You didn’t read more than the headline. The point here is that the contents need to remain. However, it is just fine if they are rewritten by an AI, have awkward words or phrases inserted, don’t sound natural or preserve the original intent - the only purpose is to spam up the internet.

From earlier in the thread:

This is one of the options I’ve tried, but it didn’t seem to work either. After a little more research, I think I have to use logit_bias for this, but I’m not sure how to programmatically tokenize keyword list inputs.

Should have replied to that directly. My use case was different from what is trying to be accomplished in the thread.

Im torn between “id rather not have my brain assaulted by terribly written everything, at least make it read well” and “if its all obviously ai written i can mentally filter it out and ignore it easier”.

Sure, let me post a little more for clarity. I can’t post the complete code or specifics of what it’s generating, but here’s what I’m doing.

Here is the web app side where I input the article’s contents in a form.

main.py

import chat_functions
from flask import Flask, render_template, request

app = Flask(__name__)

@app.route("/", methods=["GET", "POST"])
def generate_article():
    if request.method == 'GET':
        return render_template('index.html')

    title = request.form.get('title').title()
    materials = request.form.get('materials')
    instructions = request.form.get('instructions')
    keywords = request.form.get('keywords')

    content = chat_functions.ArticleGenerator(title, materials, instructions, keywords)
    article = content.generate_article()

    return render_template('index.html', article=article)

if __name__ == "__main__":
    app.run(host="127.0.0.1", port=8080, debug=False)

The chat_functions being imported in main.py above are my GPT instructions, which look like the following (please excuse the text formatting). Small side note, I changed the 13 pairs to just 13 user instructions, I’ll edit my comment above.

chat_functions.py

import openai
import os
from dotenv import load_dotenv

class ArticleGenerator():
    def __init__(self, title, materials, instructions, keywords):
        self.title = title
        self.materials = materials
        self.instructions = instructions
        self.keywords = keywords
        load_dotenv()
        openai.api_key =  os.getenv('OPENAI_API_KEY', 'xxxxxxx')
        self.messages = [
        {"role": "system", "content": f'''Generate articles that follow the users instructions exactly.'''},
        {"role": "user", "content": f'''I want you to write an article for a {title} project according to my instructions and formatting rules.'''},
        {"role": "user", "content": f'''First, write {title} as a first level heading. Under that heading, write an introduction to the project.'''},
        {"role": "user", "content": f'''Next, write 'What is {title}?' as a second level heading. Under that heading, write about what {title} is.'''},
        {"role": "user", "content": f'''Next, write 'What You Need to Make {title} At Home' as a second level heading. Under that heading, take this list of materials
                                            and write about why each material is needed for the project: {materials}. Bold the name of the material, then write
                                            no more than 40 words about why it is needed. Seperate each material and its description with a line break. 
                                            Do not include the material's quantity.'''},
        {"role": "user", "content": f'''Next, write 'Tips for Making the Best {title}' as a second level heading. Under that heading, include 5-7 points about how to build the best {title}.'''}
        ]

    def generate_article(self):
        completion = openai.ChatCompletion.create(
            model="gpt-4",
            temperature=1.2,
            presence_penalty=0.0,
            frequency_penalty=0.0,
            logit_bias={48126: -100},
            messages=self.messages
            )
        return completion.choices[0].message.content

These aren’t the complete instructions, but this is basically how I’ve got it working now. Each instruction creates an H1, H2, or H3, then writes and formats the content for each heading. This part is working great; it writes relevant content that is formatted how I specify. The issue is, while it’s writing these sections of content, I need it to use specific keywords, which I can’t get it to do.

These keywords are always relevant to the topic of the article, I just need the API to actually use them. For example, say the article title is ‘Painted DIY Bookshelves’ and a keyword is ‘coats’ (as in coats of paint). The API could either write a new sentence using the word (i.e., ‘Apply two coats of paint, letting it dry in between’), or it could replace an existing word with the keyword when it can be used interchangeably (i.e., using ‘coats’ instead of ‘layers’). It’s doing neither currently.

@chrstfer I’m going to completely redo the chat instructions following your recommendations today, so I’ll update or send a PM to take you up on your brainstorming offer. Thanks!

@_j No offense, but you are not understanding the point of this thread and it’s convoluting the conversation. What you’re talking about is keyword stuffing, which Google has been penalizing for years. What I’m talking about is keyword optimizing, which is still very much a ranking factor. Having this program write ‘fridge’ instead of ‘refrigerator’, or create new sentences using keywords that are relevant to the topic, is neither spam nor too much to ask from ChatGPT. Every website anywhere near the top 5 pages of Google SERP’s use some variation of keyword optimization to help get there, which is why I’m not here to debate SEO strategy.

@ethan.peck I appreciate the example! If I can’t get this to work with @chrstfer suggestions, I’ll try this out with the logit_bias.

1 Like

So here’s what I tried:

{"role": "system", "content": f'''You are an SEO specialist and I need your help writing articles. These need to be well written articles about [blank] that are SEO optimized.'''},
{"role": "user", "content": f'''To do this, I have a set of keywords and phrases, {keywords}, that need to be included in the article. Please include them organically and in an SEO optimized way..'''},

# Examples of these instructions can be found in my comment above
{"role": "user", "content": '''Article instructions here using 10 separate user instructions'''}, 

{"role": "user", "content": f'''“The output must include exactly the list of SEO keywords and phrases I gave you, which was {keywords}. They must appear organically as if they just came to your mind as you were writing.”'''},
{"role": "user", "content": f'''Write this [blank] article according to my instructions.'''},

I also tried combining the first user instruction with the system instruction, but neither worked. I generated 3 articles with these instructions, then took the {keywords} instructions out and generated them again, and the results were pretty identical.

To be clear, it’s writing a very nice article with other relevant keywords, just not the ones I specify.

@ethan.peck I tried your code above and couldn’t get it to work. Here’s my implementation, maybe I did something incorrectly.

import tiktoken
import openai

keywords = request.form.get('keywords')

encoding = tiktoken.encoding_for_model("gpt-4")

def get_token_id(word):
    token_ids = encoding.encode(word)
    return token_ids[0] if token_ids else None

    words = keywords
    bias = {str(get_token_id(word)): 100 for word in words}

class ArticleGenerator():
    def __init__(self, bias):
    self.bias = bias
    self.messages= #Chat instructions here

def generate_article(self, bias):
    completion = openai.ChatCompletion.create(
        model="gpt-4",
        temperature=1.2,
        presence_penalty=0.0,
        frequency_penalty=0.5,
        logit_bias={bias: 100},
        messages=self.messages
        )
    return completion.choices[0].message.content

content = chat_functions.ArticleGenerator(bias)
article = content.generate_article()

EDIT: It looks like logit_bias isn’t working at all actually. I ran this a second time to exclude just the token for the word ‘typically’ like this

logit_bias={48126: -100},

but the generated content still included ‘typically’ 3 times.

This is a straightforward task. Not sure why you’re facing this issue. Here’s a demo in Playground

You can combine this with @elmstedt 's suggestion and get successful completions with the desired keywords.

2 Likes

Yes this works if you use a single instruction to generate the entire article as in your example, but it’s not working when I have multiple instructions. Maybe I’m not using the correct method to achieve what I’m trying to do; if I have ~10 headings in the article and the content under each heading needs its own instructions and formatting, and these keywords need to be used throughout, what’s the best way to do that?

I’ve tried every variation of @elmstedt 's suggestions I can think of and more, it’s just not using the words. In fact, it even ignores set word counts and formatting instructions intermittently, which should be even more straightforward.

I had a quick skim through this thread, and I can’t seem to find the answer to a question I have, how many keywords are in the list you give it?

If it’s more than 5-6, it’s almost never going to work. So I just wondered are you giving it 20 keywords? or just 3 or 4?

1 Like

I made it work to extract skills from commit files and it works 100% correct. Checked thousands of commits and extractions manually and did not find a single wrong skill.

But that’s all I am telling. It is possible. I won’t tell how. Went through many weeks of sleepless nights for that.