How make the assistant to be in the context?

Hi there! I’ve created an assistant for chatting with my clients in a real estate field.
I send requests and get response due api with a python code. My clients chat me in whatsapp then the message is sending to the gpt-assistent. the problem is that the assistant doesn’t remeber the context. every next message from a client it get like an absolutely new message without any connection with the previous one. so the assistant can’t keep a complited mindfull conversation.
Any advices please?

Here’s a plausible demonstration of a contextual WhatsApp exchange between a user and a real estate AI assistant:


AI Assistant:
Hi Alex! :waving_hand: I’m RealtyBot, your personal real estate assistant. Interested in checking out some properties today or need help with buying or renting?

User:
Hey RealtyBot, I’m considering buying. I’d like a 2-bedroom condo downtown, preferably near a park or public transport.

AI Assistant:
Great! :cityscape: I found several 2-bedroom condos downtown within walking distance to parks or public transit. What’s your approximate budget?

User:
About $500,000. Can you share some listings?

AI Assistant:
Perfect! Here are three excellent condos within your budget:

  1. The Parkside Towers – 2BR, $485,000, next to Riverside Park, 2 mins walk to Metro Station.
  2. Maple Residences – 2BR, $499,000, 5 mins from Central Park and downtown bus hub.
  3. Urban Vista Condos – 2BR, $475,000, overlooking City Garden, close to tram stops.

Would you like more details on any of these?

User:
Can you tell me more about the second option?

AI Assistant:
Sure! Maple Residences is a modern 2-bedroom condo priced at $499,000. It’s located on the 9th floor, offers great city views, includes underground parking, gym access, and is pet-friendly. Only a short 5-minute walk to Central Park and the downtown bus hub. :bus::deciduous_tree:

Would you like to schedule a viewing?

User:
Does it include any HOA fees or monthly maintenance costs?

AI Assistant:
Yes, Maple Residences has a monthly HOA fee of $420, covering common area maintenance, gym access, building security, and landscaping. Utilities such as water are included, but electricity and internet are extra. Would you like estimated utility costs too?

User:
Please do! And how about nearby schools?

AI Assistant:
On average, monthly electricity and internet combined costs residents approximately $120–$150. As for schools, the condo is zoned for Evergreen Elementary School and Downtown Middle School, both rated highly (8/10). :school: Would you like more info or assistance with financing options?

User:
Can you check what kind of mortgage payments I’d expect at this price?

AI Assistant:
Absolutely! For a purchase price of $499,000, assuming a 20% down payment (~$99,800) and a 30-year fixed-rate loan at 6% interest, your monthly mortgage payment (principal and interest) would be approximately $2,393. :bank:

Would you like help connecting with a lender to explore financing?

User:
Yeah, connect me with someone, please.

AI Assistant:
Fantastic! I’ve notified Lisa Roberts, a top-rated local mortgage specialist who will reach out to you shortly to help explore financing options. :telephone_receiver::speech_balloon:

In the meantime, would you like to schedule a viewing for Maple Residences?


This conversation demonstrates how the assistant continually leverages chat history to keep context, enabling it to effectively answer follow-up questions that require prior context.

It would fail without a “memory”.

The thing is: individual API calls to chat models are not stateful, and a developer’s account may have hundreds of “active” user sessions, along with past chats that can be rejoined.

Therefore, it is you that must keep track of the user, and send back the conversational context of several previous user inputs and assistant outputs, in a list that finally ends with the latest user input.

What you send in a future API call may look like that entire list of previous messages, to which you have appended the latest user input. The AI then can follow and infer from the entire chain of growing chat that has been re-sent.

There are multiple AI endpoints that you might be interacting with. OpenAI has Chat Completions, Responses, and the deprecated Assistants platform. Chat Completions is likely what you are using, taking a full “messages” list every time. Responses can act the same, but also lets you send a “previous response ID” to refer to a server-side record of the chat instead that you have stored with the “store” parameter, but you still must track the user and the chat session they are continuing.

The first thing to implement is actual user tracking via the bot. You need to know what WhatsApp user is in a particular chat session that they are having, give them the opportunity to start a new chat, and must never confuse users or allow cross-session data leakage. Then you’d need some expiration policy for your data and a particular session of messages.

3 Likes
  1. Treat each client as a separate thread

The Assistants API keeps the entire message history that belongs to a single thread.
If you create (or look-up) the same thread_id every time that user writes, OpenAI will automatically supply the full context to the model—you don’t need to prepend previous messages yourself. 

Typical flow

┌─ New inbound WhatsApp msg ─────────────────────────────┐
│ 1. Look up client_id → thread_id in your DB │
│ • if none: POST /threads → save returned id │
│ 2. POST /threads/{thread_id}/messages (add message) │
│ 3. POST /threads/{thread_id}/runs (run assistant) │
│ 4. Deliver assistant’s reply back to WhatsApp │
└────────────────────────────────────────────────────────┘

  1. Persist thread IDs in a lightweight store

A Redis hash, SQLite table, or any CRM field that maps whatsapp_user_id → thread_id works.
Create the mapping on the very first message; reuse afterward. This single step usually fixes the “stateless” behaviour you’re seeing.

  1. Mind the token budget as threads grow

Because the API automatically passes the whole thread to the model, long conversations can get expensive.  Two common strategies:

Strategy How it works When to use
Summarise & trim Periodically summarise earlier turns into a single system note, then delete or archive the original messages. High-volume chat with many turns per client.
Split by topic Start a fresh thread when the subject changes (e.g., after closing on a property). When older details are rarely needed later.

The API automatically truncates oldest content when it nears the model’s context window, but doing your own summarisation gives you predictable costs and faster responses. 

  1. Store long-term knowledge outside the thread

Client preferences (budget, neighbourhoods, etc.) or your agency’s property database shouldn’t be repeated in every message. Options:
• Vector store + retrieval tool – Upload PDFs/listings to the assistant’s file store and let the built-in file_search tool pull snippets when relevant. 
• Your own DB + function calling – Write a “get_client_profile” function, expose it via the Assistants API’s tool schema, and let the model call it when it needs persistent facts.

  1. Keep an eye on upcoming changes

OpenAI has signalled that the newer Responses API and an Agents SDK will supersede the Assistants API by mid-2026, but the same thread-per-user pattern is expected to remain. Early adopters are already migrating with minimal code change.