How to update the embedding simulating the real time?

I read the openAI documentation and browsed the web, researching the topic of embedding.

It’s definitely a very interesting topic and a very new way of using your own knowledge base with GPT, but within all the research I’ve done I haven’t found anything clear showing how to update those inserts.

In a company the data is changing, sometimes it changes in months and in other circumstances in seconds. I know OpenAI isn’t ready for real-time data updating yet, but would it be possible and maintainable to do so with embeds? Because sometimes it may be that the user only uploads a change due to a typographical error, because he wants to change one paragraph for another and so, imagine creating embeds for each time he uploads a change.

Another query that is not clear to me, aren’t you losing the context of your knowledge base when splitting the data?

Currently my use case is to query my knowledge base and also generate a response from any other topic. The problem is how to handle the first one, I have used different tools. I am currently using Cognitive Search for indexed search, but the problem is that the texts have up to more than 13000 characters, to that I add that I format the text and add words like {{IMG}} in the text that within my context the idea It would be something like this:

See: How to collect cash with the application?
Cognitive Search: Performs the search in the index, the index has the files in a container that returns a json where it has the text “Collection expires… {{IMG}}” and an array that has the images [“img1” , “img2”]
GPT: Receives the GPT response from the context “In the application, you can configure charging as follows…{{IMG1}}”
Text processing: With the GPT response, {{IMG}} is replaced by the image that corresponds to the index
Send response: response received

Of course, I can’t control what GPT responds to, because even if the text has {{IMG}}, there will be times when it ignores it and the token number becomes too long, increasing latency or response time. . . from openAI. I came to use Cognitive because I loved Microsoft’s proposal to combine both tools, but unfortunately I have not been able to use them as they do.

And I found some forums here talking about embedding but I don’t see that they give closure to the queries, it would be good to be able to answer all these questions, because in the end it is “new” technologies that we need to learn to apply correctly in our use cases, if we don’t know how to use them How to get the most out of them? And if they are open source it is because anyone can have access to the information, thank you very much for your answer.

1 Like

This is dependent on what you use to store your embeddings, but generally you would store the embeddings with some metadata like document IDs, etc. In your vector storage then you can delete by ID, generate new embeddings with OpenAI, and insert. This isn’t a specific thing for OpenAI, so consult the documentation/community of whatever you are using for vector database for the best practices.

The API to create embeddings is pretty fast/cheap, so it’s feasible that you could do it for every update. If you find in your use-case that this is too much, you could write code some scheduled task to query your database for content changed in the last X minutes and create new embeddings from that.

To some extent yes. You’ll need to experiment with the chunk size you use when splitting data to ensure you have enough context to provide GPT. This varies, so not something we can provide. If all the documents are written in a more concise manner you can get away with smaller chunks.


Embedding is for static data like books, manuals or other documents that are rarely updated. If your manual/doc is updated, you need to create a new embeddings, remove the old embedding from your system and use the new one.

If your application need to fetch dynamically changing data (e.g. seasonal products, offers, etc.), use function calling wherein you fetch data from your external API and use the LLM AI to act as in between to prepare the commands for your external API and provide you with the result in conversational tone.

1 Like

In my scenario, I use the Drupal CMS for inserting/updating/deleting content (nodes, paragraphs, comments, files). Every piece of content has an ID that uniquely identifies it with it’s embedded counterpart. I programmed an event listener for all CRUD operations that queues data on each action to be updated in the vector store. In short, I developed a system to synchronize my site data with my vector store. I briefly touch on this system, and show it in action, here: SolrAI Module: Embedding Content - YouTube

Unfortunately, I am unable to locate the post where I discuss this, but one method I have used to maintain context between chunks is to add a summary of the main document to each of it’s chunks. I also give each chunk the same title as the main document.

Here is a good conversation that touches on the subject: The length of the embedding contents - #43 by SomebodySysop


Here is the flowchart I put together for my embeddings synchronization process. In case it helps anyone else.