GPT-3.5-turbo how to remember previous messages like Chat-GPT website

I believe text-davinci-003 uses “prompt” instead of role/content messages.
Maybe try adjust the model to gpt-3.5-turbo for the code. :slight_smile:

I modified my code to the one below but still get this error (Error: Request failed with status code 404)


let conversation = [
    role: 'system',
    content: "You will follow the conversation and respond to the queries asked by the 'user's content. You will act as the assistant"
];'/chat', async (req, res) => {
  try {
    // prompt+='user: ' +req.body.prompt + '\n'
    let prompt = req.body.prompt 
        role: 'user',
        content: prompt
    const response = await openai.createCompletion({
      model: "gpt-3.5-turbo",
      messages: conversation,
    // prompt += 'assistant: ' +[0].text.trim() + '\n\n'
    // console.log(prompt)
        role: 'assistant',


  } catch (error) {
    res.status(500).send(error || 'Something went wrong);
1 Like

The API is stateless which means you have to send all your previous messages that you want used for the chat history with each new request

This is a great relevant topic because I see little point in not keeping context. I mulled about doing it myself but then I found a great tutorial made by Voiceflow that also features Alexa integration. It basically boils down to Javascript arrays and appending replies to the initial array. Something like:

messages = []
messages.push({"role": "user", "content": user_reply})
=> {gpt_reply}
messages.push({"role": "assistant", "content": gpt_reply})

I’m blocked from adding links to this forum but if you go to Voiceflow’s blog and look in the Developers blog category, the article is titled: “How to create an Alexa skill with GPT-4 and Voiceflow”

@rbritom example is valid but what about the long conversation? If I pass the whole message then I’ll get a token limitation error what can I do?

What to do about long conversations? That has many answers with progressive quality of memory, where we first calculate and store the tokens used by each user input and ai response (and calculate the overhead of the chat format):

  • Discard older conversation turns that won’t fit into the remaining context space after calculating tokens locally for prompt, user input, max_tokens reservation.
  • Use another AI to periodically summarize turns or the oldest conversation.
  • Use a vector database and AI embeddings to remember the whole conversation and pass prior exchanges that are most relevant to the current conversation flow.
  • More advanced context-aware conversation thread topic tracking systems.

Another option for the advanced user is a GUI that allows self-management of the conversation and selection of past turns to be sent.


I have found the solution you need to calculate the token before passing it to the openai API,
Tiktoken helps me a lot you can find tokens of messages list of any language, I have added code below please check. you need to update the messages list, for example, if your token limit exceeds so you can remove the very first messages to full fill the token limit error.
If you still have any questions or face any issues feel free to contact me on Linkedin: aliahmadjakhar

def num_tokens_from_messages(messages, model="gpt-3.5-turbo-0613"):
    """Return the number of tokens used by a list of messages."""
        encoding = tiktoken.encoding_for_model(model)
    except KeyError:
        print("Warning: model not found. Using cl100k_base encoding.")
        encoding = tiktoken.get_encoding("cl100k_base")
    if model in {
        tokens_per_message = 3
        tokens_per_name = 1
    elif model == "gpt-3.5-turbo-0301":
        tokens_per_message = 4  # every message follows <|start|>{role/name}\n{content}<|end|>\n
        tokens_per_name = -1  # if there's a name, the role is omitted
    elif "gpt-3.5-turbo" in model:
        print("Warning: gpt-3.5-turbo may update over time. Returning num tokens assuming gpt-3.5-turbo-0613.")
        return num_tokens_from_messages(messages, model="gpt-3.5-turbo-0613")
    elif "gpt-4" in model:
        print("Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.")
        return num_tokens_from_messages(messages, model="gpt-4-0613")
        raise NotImplementedError(
            f"""num_tokens_from_messages() is not implemented for model {model}. See for information on how messages are converted to tokens."""
    num_tokens = 0
    for message in messages:
        num_tokens += tokens_per_message
        for key, value in message.items():
            num_tokens += len(encoding.encode(value))
            if key == "name":
                num_tokens += tokens_per_name
    num_tokens += 3  # every reply is primed with <|start|>assistant<|message|>
    return num_tokens
1 Like

Use openai.createChatCompletion it’s simple :grinning:

Is openai.createChatCompletion has no token limit? what happed is the token limit exceeded?

I developed this node.js script to save every GPT-4 request and response into a JSON file.

It then sends the file content as conversation history with each new prompt.

// Import required modules
const fs = require('fs');
const axios = require('axios');

// Your OpenAI API key
const apiKey = 'your-openai-api-key';

// Function to interact with OpenAI API
async function interactWithAI(userPrompt) {
    try {
        // Define the message data structure
        let messageData = { 'messages': [] };

        // If requests.json exists, read and parse the file
        if (fs.existsSync('requests.json')) {
            let raw = fs.readFileSync('requests.json');
            messageData = JSON.parse(raw);

        // Format the conversation history and the new user request
        let systemMessage = "Conversation history:\n" + messageData['messages'].map(m => `${m.role} [${m.timestamp}]: ${m.content}`).join("\n");
        let userMessage = "New request: " + userPrompt;

        // Make a POST request to OpenAI's chat API
        let response = await axios({
            method: 'post',
            url: '',
            headers: { 'Authorization': `Bearer ${apiKey}`, 'Content-Type': 'application/json' },
            data: { 'model': 'gpt-4', 'messages': [ { "role": "system", "content": systemMessage }, { "role": "user", "content": userMessage } ] }

        // Log the AI's response

        // Get the current timestamp
        let timestamp = new Date().toISOString();

        // Add the new user request and the AI's response to the message history
        messageData['messages'].push({ "role": "user", "content": userPrompt, "timestamp": timestamp });
        messageData['messages'].push({ "role": "assistant", "content":['choices'][0]['message']['content'], "timestamp": timestamp });

        // Write the updated message history to requests.json
        fs.writeFileSync('requests.json', JSON.stringify(messageData, null, 2));

        // Return the AI's response
    } catch (e) {
        // If an error occurred, log it to the console and return an error message
        console.error('An error occurred:', e);
        return 'An error occurred while interacting with the OpenAI API. Please check the console for more details.';

cc: @abdeldroid

1 Like

But when I try CHATGPT, I ask same question twice with a bit different tones, 1st time it replies an answer and add “…please notice this is only a fairly answer, blah blah blah”, and 2nd time it answers again but this time it says “Again I have to remind you, blah blah blah”. It seems to me that CHATGPT knows I ask same question twice. How does it do this?

This discussion is about programming the API, where the programmer must construct each new call to the API AI model by including their own “memory” of what was previously said.

In ChatGPT (the website chatbot you can talk with), this conversation history database and the management of what prior conversation the AI sees when it answers is handled by the ChatGPT web and server software and its programmers.

Oh! I see. So I have to redesign whole chatting ability myself when use API. That’s really a challenge to me.

1 Like

I wrote up a very small python gpt-3.5-turbo chatbot for you:

  • Complete record of conversation, "chat",
  • passes 5 previous question/answers from chat to the AI so the topic is clear,
  • streaming generator, so you receive words as they are created,
  • tolerates no API errors
  • edit in your own API key to use; type exit to leave
import openai
openai.api_key = "sk-xxxxx"
system = [{"role": "system", "content": "You are a helpful AI assistant."}]
user = [{"role": "user", "content": "Introduce yourself."}]
chat = []
while not user[0]['content'] == "exit":
    response = openai.ChatCompletion.create(
        messages = system + chat[-10:] + user,
        model="gpt-3.5-turbo", stream=True)
    reply = ""
    for delta in response:
        if not delta['choices'][0]['finish_reason']:
            word = delta['choices'][0]['delta']['content']
            reply += word
            print(word, end ="")
    chat += user + [{"role": "assistant", "content": reply}]
    user = [{"role": "user", "content": input("\nPrompt: ")}]

(revised: the AI provides an introduction so we immediately check connection, and one less conditional.)

The openai library requires python 3.7-3.9. Install the module with
pip install --upgrade openai

The next level is to use a token counter (tiktoken), record the size of each chat message, and use some size criteria to determine how many past messages to send.

To be clear, the messages are just injected directly in-prompt, correct? OpenAI doesn’t do any fancy summarization of previous messages or embedding/retrieval like they do on, correct?

You provide the past conversation yourself by your code. A more advanced chatbot can make occasional summaries, can use a vector database to retrieve relevant chats that are even older, can continue increasing the number of turns up to the context length limit, or instead intelligently cut chat to minimum, and more.

What my code does is the typical basic operation:

  1. always pass the system instruction unaltered as the first message;
  2. then pass the history user prompts and AI assistant replies as role messages (up to the most recent 10 messages total);
  3. then add the current user prompt role message that needs an answer.

I coded a particular way for the forum so that future features like calling functions and their returns are easily recorded in chat history, which don’t have a distinct input/output pairing like the user messages. The history being a python dictionary, other metadata like input or reply tokens can be added to chat history items after receiving the successful AI response or using tiktoken. The “chat” dictionary is the full session history in case one wanted to save or reload it by command.

Proper API error handling and retrying is alone at least twice as many lines of code as I demonstrate.

My own local API chatbot is 1500+ lines of GUI python that someday I’ll continue plugging away at under the hood, after tedious GUI widgets like an expanding input box, option to send only after multiple return presses (or “submit” and resubmit like playground or just add to history), expanding panes, colored status bar with actions based on errors, font and element resizing, auto-collapsing editable and rearrange-able history, live greying of unpassed history adaptive to token sliders and input, and other non-features.


When pass related prior exchanges, can I only send the Vector Database Links of related exchanges? Or have to send the whole related Text Contents ?

As I remember openAI doesn’t exept external links.

A vector database for conversation would naturally store the individual conversation turns. Embeddings is a special search method that looks not just for keywords, but for meaning. One could then retrieve older baseball chat from the conversation database if the subject being discussed has returned to baseball.

One could also have those longer conversation exchanges summarized or given context by another AI, so that they can be more compact or stand alone.

There is no “link”, one would provide the past conversation turns just as if they actually happened (with the appearance of other missing conversation if you were to read the new transcript the AI receives).

1 Like

Is there any model which preserve previous chat like chat gpt 4?

Hi and welcome to the Developer Forum!

The is currently no model that retains context across API calls.