Currently this code writes to the specified file as csv, but you can alter the code to go with JSON or any other format that works for you.
For lookups at scale, you can use vector databases
I’m using csv for the fact that it allows me to append new conversations to the file without having to read the file, which same time and compute, and of course the code I wrote can be further optimized for speed.
This code doesn’t summarize the conversation, it just stores every message along with embeddings and retrieved the most relevant ones as specified by the mask
you can write own criteria for choosing the conversation.
You can add more code to it. e.g If the relevant messages were to consume too much tokens, they can be summarized before being sent for completion.
IMO this is not the only or the best approach, it’s a demo for learners to be acquainted with embedding for conversational context.
I’m not sure what you mean by bias in this context. As @anon22939549 shared, embeddings are used to filter out semantically relevant info from the rest of content.
The code is written for Chat Completions API, which means it can consume any model as of now that can be accessed over the chat completions API. You can simply change the name of the model in the API call to the chat model you want to consume.
You aren’t wrong here, but that was for gpt-3.5-turbo-0301
the latest 3.5 model released on 0613 is better at understanding context and following system message.
In fact,
However you can use gpt-4
if you still want to, no changes required in the code, except the model name.