I am in the process of creating an application which will serve as a memory on my laptop so that I can ask it any questions about my activity and it can answer them, for the moment I have made quite a bit of progress and I reached a block, I am able to put my data in a database and I am able to do semantic search on my embeddings to get answers to my question (if I ask a question related to my activity ).
I use the OpenAI API, Pinecone for the DB and langchain. before I cleaned my data to add them to my DB, my responses worked but were not optimal, now they are optimal but 80% of the time no longer works since I exceed the maximum token which is normally 4096, my usage is of 20,000 for each completion.
I’m really not familiar with this, but I thought that if I was able to summarize my data maybe the prompt would be smaller and it would take up less space in my DB and in the vector search? If so, how can I do this? knowing that the data is added in real time to a file in this way:
Date: 2024-01-11 5:34
xxxxxxxxx (a long text)
I would like to repeat each time something is added to this text.
I would like if possible not to use an expensive API even though it is minimal since as I said it is an addition in real time, every minute a text is added.