Gpt bot with embedding + keeping context of previous questions/answers

Hey everyone,
i created a python code that calls gpt api to answer questions based on an embedding file.
the flow is as such . i got a embedding file with all vectors assigned to my teaching data
the first step is that im creating an embedding from my input question then calling a distance function to my embedding source data to match the question ive given to the learnt data.

then i give simple prompts and one of them directs the gpt to answer based on the Context i generated from X answers that are vector closest to my question.

if i ask a single question the gpt returns the right answer based on the learning data. ( the distance is calculated between the question and the learning data)
so for example:
if i ask " how many elderly are in us?" and my learning data have a information regarding this question - it can return me " there are 90 million elderly in us"
then if i ask " how about in England?" then it will try to create a context based on that question alone without knowing that the user means " how many elderly are in England". then it just answers randomly or just says he cant answer
i tried to bypass it by creating the context based on all historical answers so the data will be able to match also to the question of" how many elderly are in us, how about in England". then it answers kind of correctly .
then i encounter another issue. what is the first answer is sufficient then the user asks a whole different question. i don’t want to calculate the context with previous question that got nothing to do with the new one . it results with very bad answers.
does anyone encountered such a use case and knows how to handle this case?
im passing to the gpt the messages as prompts + context that contains the most vector closest of my learning data then the list of questions and answers

in short embedding and keeping context between previous questions/answers

So you are using RAG to ask questions. The questions seem to be very basic, do you actually need the knowledge base you are using to ask the questions? If you really need to use RAG, maybe don’t pass the whole history + new user message to the embeddings model, just maybe the very last question? That will cut down on the amount of “usesless context” that gets used.

i tried to create the context based on the last question only. but if its too vague like the example then the context is being created not correctly.
for example
i have data of twitter users between 2020-2023. then i send the question how many active users there were in twitter for 2020-2022. then i send the question to the context it delivered the answer correctly. then i ask " what about 2023?" as a human you understand that he means " how many active users in twitter for 2023" ( i got the data). but when i compare the “how about 2023” to my embedding file i get non relevant data. so the answer i get is wrong