System Prompt ( Sending each time instead of once )

Hi, I am new to openAI API.
I am trying to make a script to help me create a long-form article.
Here is the process:
I provide 2000-word content via a text file about the topic and ask my script to use that information as a system prompt.

Next, I ask it to create a title and then a meta description.
Next, I ask you to grab the outline from a Txt file and process each heading one by one.So far, all is good.

Now, my problem is each time the script sends a request for writing title, mega, or headings … it also sends the full system prompt (2000 words) text for each request.

Is there a way to send the system prompt once at the beginning, and every time I ask for something to write, it will read from that first sent prompt?

Unfortunately not. You will have to send the full prompt along with every API call.

However, it sounds like you might be able to consolidate your tasks into a single request and just ask the model to return a title, meta description, etc. in one go. Have you tried that?

You have to send the prompt each time along side the call in order for the API to understand what is happening because 2 API calls are always independent of each other

Yeah, I try with one go … output is not good :frowning:
Can I send user input for the 2000-word context of the article in place of a system prompt and then use that same user input again and again?

I mean in claude.ai we can give a big doc and ask claude to use it again and again …

i mainly want to reduce the input token price :wink:

Ahhh, so how do other people build chatbots?? Do they have to send all the previous chats again and again ??

I have not used the Claude API myself but I doubt that the logic of the API call would be different? Are you talking about the API or the chat interface?

As for your question: the logic is the same independent of whether it is system or user prompt. You have to supply it with every API call.

Normally, the system prompt would contain instructions for the task. The user input then the more detailed inputs. In your case for example, I would expect that the article is supplied as part of the user prompt.

Perhaps you can share examples of your system / user prompt and we can take a look if it can be optimized?

Depends on which version of the API you are using. The older version without the threads, you basically either pass the whole conversation or if there is a context window problem, generate a summary as pass that of as “context from the previous question”.

In the newer version, you can create a agent using a thread. The thread can act as a single conversation

any example or doc link?? currently using response = openai.ChatCompletion.create

generate_title_and_meta(keyword, background_info):
prompt = f"Generate a title and meta description for an article about ‘{keyword}’ based on the following background information:\n\n{background_info}\n\nTitle:\nMeta Description:"
log_api_request(“/v1/chat/completions”, {“model”: AI_MODEL, “messages”: [{“role”: “system”, “content”: “You are a helpful assistant that generates article titles and meta descriptions.”}, {“role”: “user”, “content”: prompt}], “temperature”: 0.7, “max_tokens”: 100})
response = openai.ChatCompletion.create(
model=AI_MODEL,
messages=[
{“role”: “system”, “content”: “You are a helpful assistant that generates article titles and meta descriptions.”},
{“role”: “user”, “content”: prompt}
],
temperature=0.7,
max_tokens=100

and

generate_section_content(keyword, background_info, section_outline):
prompt = f" Write a blog post section in a plain text format focusing on ‘{section_outline}’ for the article ‘{keyword}’ Please use {background_info}. Please limit the content to 5 to 8 sentences or 3 to 4 very small, easy-to-read paragraphs. Additionally, include an informative section in an enthusiastic, engaging tone about ‘{section_outline}’. Maintain an upbeat, conversational style throughout that informs readers while keeping their interest."
log_api_request(“/v1/chat/completions”, {“model”: AI_MODEL, “messages”: [{“role”: “system”, “content”: “You are a helpful assistant that generates article section content.”}, {“role”: “user”, “content”: prompt}], “temperature”: 0.7, “max_tokens”: 400})
response = openai.ChatCompletion.create(
model=AI_MODEL,
messages=[
{“role”: “system”, “content”: “You are a helpful assistant that generates article section content.”},
{“role”: “user”, “content”: prompt}
],
temperature=0.7,
max_tokens=400

https://platform.openai.com/docs/assistants/how-it-works/creating-assistants

Here’s a link to the Assistants API documentation, you’re currently calling the ChatCompletions API.

The way chat completions works is it sends the entire conversation each message.

So by the 4th message you’re sending the first 3 messages from both you and the bot, as well as the system prompt in each one of those messages.
P + User + Bot + p2 + user2 + bot2 + p3 + user3 + bot3 etc
So that’s why Chatbots are a bit pricey to run. Assistants API claimed to help with that by introducing threading, but its been pretty buggy thus far (last I heard, someone could correct me on that). I think there is a way to get down how many tokens you send each time using RAG, but I’m not necesarilly sure how.

This can be further optimized and tailored to your specific needs but to give you an idea. Note that I used a curl request for testing, so slightly different conventions. Background info was about 2,000 words from a Wikipedia article on raccoons plus separately a very simplified outline.

{
    "model": "gpt-4-0125-preview",
    "messages": [
        {
            "role": "system",
            "content": "You are an AI assistant specialized in creating posts for an animal blog. Your role is to complete three interrelated tasks. Task (1): You write a blog post section based on the background info and the section outline provided. The post consists of 5-8 informative sentences, written in an enthusiastic and engaging tone; Task (2): You create a title for the post; Task (3) You write a meta description based on the post."
        },
        {
            "role": "user",
            "content": "Background info: '''BackgroundInfo'''; Outline: '''Outline'''"
        }
    ],
    "max_tokens": 2000,
    "frequency_penalty": 0,
    "presence_penalty": 0,
    "temperature": 0.7,
    "stop": null
}

This returned the following:

Title:

“The Remarkable Raccoon: A Deep Dive into the Life of North America’s Mischievous Bandit”

Blog Post Section:

The raccoon, a symbol of adaptability and cunning, has fascinated humans across North America and beyond for centuries. Known scientifically as Procyon lotor, this creature is not only the largest member of the procyonid family but also one of the most recognizable due to its distinctive facial mask and ringed tail. These features have not only captured our imagination but have also played a significant role in the mythologies of Indigenous peoples. What makes the raccoon truly remarkable, however, is its intelligence. Studies have shown that raccoons possess an extraordinary memory, capable of recalling solutions to tasks for up to three years. This nocturnal mammal, with its diet comprising invertebrates, plants, and vertebrates, showcases a remarkable adaptability in both diet and habitat. Originally from deciduous and mixed forests, raccoons have extended their range to mountainous areas, coastal marshes, and even urban settings, demonstrating their incredible ability to thrive in diverse environments. Yet, their presence in Europe as an invasive species since 2016 highlights the challenges of their adaptability. Despite their common portrayal as solitary animals, recent evidence suggests a complex social structure, where related females share territories and unrelated males form groups. This social behavior, coupled with their adaptability, has ensured the raccoon’s survival in various climates and conditions, although their lifespan in the wild typically ranges from 1.8 to 3.1 years due to factors like hunting and vehicular accidents. From their evolution, which traces back to the late Oligocene, to their current status as both cherished and challenged beings, raccoons continue to intrigue and inspire curiosity about the natural world around us.

Meta Description:

Dive into the intriguing world of the raccoon, Procyon lotor, a true marvel of North America’s wildlife. Discover the intelligence, adaptability, and complex social structures of these nocturnal creatures, from their diverse habitats to their survival tactics in both wild and urban environments.

1 Like

def generate_content_with_prompt(prompt, system_message, max_tokens):
log_api_request(“/v1/chat/completions”, {“model”: AI_MODEL, “messages”: [{“role”: “system”, “content”: system_message}, {“role”: “user”, “content”: prompt}], “temperature”: 0.7, “max_tokens”: max_tokens})
response = openai.ChatCompletion.create(
model=AI_MODEL,
messages=[
{“role”: “system”, “content”: system_message},
{“role”: “user”, “content”: prompt}
],
temperature=0.7,
max_tokens=max_tokens
)
return response

def generate_title_and_meta(keyword, background_info):
prompt = f"Generate a title and meta description for an article about ‘{keyword}’ based on the following background information:\n\n{background_info}\n\nTitle:\nMeta Description:"
system_message = “You are a helpful assistant that generates article titles and meta descriptions.”
return generate_content_with_prompt(prompt, system_message, 100)

def generate_section_content(keyword, background_info, section_outline):
prompt = f"Write a blog post section in a plain text format focusing on ‘{section_background_info}. Please limit the content to 5 to 8 sentences or 3 to 4 very small, easy-to-read paragraphs. Additionally, include an informative section in an enthusiastic, engaging tone about ‘{section_outline}’. Maintain an upbeat, conversational style throughout that informs readers while keeping their interest."
system_message = “You are a helpful assistant that generates article section content.”
return generate_content_with_prompt(prompt, system_message, 400)

“This refactoring creates a generate_content_with_prompt function to handle the repetitive task of logging the API request and calling openai.ChatCompletion.create with the appropriate parameters. This function is then used by both generate_title_and_meta and generate_section_content , with only the specific parameters for each task being passed in. This approach reduces redundancy and makes your code more maintainable.” -GPT4

maybe late to this, but isn’t this what the assistant API allows you to do?
create the assistant once with a system prompt and then feed it with other prompts from user land?

https://platform.openai.com/docs/assistants/how-it-works/runs-and-run-steps

1 Like

Both the assistant endpoints and the chat completion endpoints allows this behavior, the difference is just where the conversation history is stored.

OpenAI will handle the conversation history, files and tools (like a vector store and code interpreter) if you use the assistants API, but you’ll have to handle these yourself if you decide to use the chat completion endpoint. :hugs:

you’re talking about memory? is that something you can toggle on/off for the normal chat completions to have context stored server side?

any case I’m not sure with the current API how you re-use a chat instance, or get a session ID, so it would know which memory to use. There is a user param on chat completions but docs say that’s more for abuse
https://platform.openai.com/docs/guides/safety-best-practices/end-user-ids

The assistant API seems the way to do it. But I’d prefer to avoid that as I want to run some of these in bulk with other less capable APIs…