How do I get rid of blank space characters in prompts?

Apologies up front if this is a more general programming question.

I’m using the API with Python, and here is an example of one of my user message prompts with somewhat standard Python indentation (1 tab indent per line, looks like more here):

completion = openai.ChatCompletion.create(
                model=gpt_model,
                temperature=0.7,
                logit_bias=bias_words,
                    messages=[{"role": "user", "content": f"""
                                    Write '<h3>Title:</h3>' in title case. 
                                    Under that heading, write a 60-character SEO-optimized title for {article_title}. Write five different ones for {article_title}.
                                    Examples: 
                                    The Best Running Shoes of 2023 (Comfortable & Stylish!)
                                    Classic Truffle Pasta (Super Easy, 30-Minute Vegan Recipe)
                                    Top 10 Easiest Plants for a Backyard Garden
                                    """
                                    }]
            )
            return completion.choices[0].message.content

The issue is, when I submit this as a prompt, there are 26 empty spaces after each line of text that are counting towards the total token count.

I don’t want to jam all the prompt text onto one line because it has a tendency to ruin the format of the output. Is there a better way to go about formatting my prompts or something else I can do to get it to stop submitting blank spaces to the prompt after each line?

Thanks

With a docstring, all spaces that are placed between the triple-quotes are included, every bit of indentation you show is passed.

You can use the .strip() method on the string to simply and only remove whitespace before and after. This allows you to maintain a clear presentation where all the text is in one readable block.

multi_line = """

This is the text.
I also write a second line

The end is clear and separated from code also.

""".strip()

strip() operates on the linefeeds and spaces just before and after the contents. My string thus starts exactly at the word “This…”.

You do not and cannot indent the text within the docstring though - spaces will appear in the output.

Within the parenthesis, indentation is arbitrary, but should be readable, pythonic. However, you’ve made a fatal mistake, the “return” line is the next line after completion, and also must go back to root indentation; the indentation of parenthesis contents didn’t indent where the next line is expected.

I fix your code, fix the string, then run it by run it through the Black formatter to clean up the indentation just for readability.

completion = openai.ChatCompletion.create(
    model=gpt_model,
    temperature=0.7,
    logit_bias=bias_words,
    messages=[
        {
            "role": "user",
            "content": 
f"""
Write '<h3>Title:</h3>' in title case. 
Under that heading, write a 60-character SEO-optimized title for {article_title}. Write five different ones for {article_title}.
Examples: 
The Best Running Shoes of 2023 (Comfortable & Stylish!)
Classic Truffle Pasta (Super Easy, 30-Minute Vegan Recipe)
Top 10 Easiest Plants for a Backyard Garden
""".strip(),
        }
    ],
)
return completion.choices[0].message.content

If you like the look of indentation for readability, we can place the string in parenthesis, and then use implicit line continuation to join individual strings.

completion = openai.ChatCompletion.create(
    messages=[
        {
            "role": "user",
            "content": (
                "Write '<h3>Title:</h3>' in title case.\n"
                "Under that heading, write a 60-character SEO-optimized title for {article_title}.\n"
                "Write five different ones for {article_title}.\n"
                "Examples:\n"
                "The Best Running Shoes of 2023 (Comfortable & Stylish!)\n"
                "Classic Truffle Pasta (Super Easy, 30-Minute Vegan Recipe)\n"
                "Top 10 Easiest Plants for a Backyard Garden\n"
            ),
        }
    ],
)
2 Likes

Agree with @_j I just want to point one thing out.

This does not count towards your tokens (besides just the one for each line). It does influence the model, but probably very slightly.

Still. Definitely a better idea to not have it in. Who knows, it could confuse the model as long indentations are commonly associated with coding.

2 Likes

If you want stupid token tricks, you CAN indent by one space. This will ensure the tokens used are not “beginning of line” words, but “starts with a space” words, which appear much more in corpus, enhancing understanding.

1 Like

Ahh that’s much cleaner, thanks @_j, I appreciate it!

Oh, and in the final example, put the f-string’s “f” back on the line with the string that needs it!

1 Like