Vector embedding notes and chat history

flowalther · October 13, 2023, 2:41pm

I have a question for my general understanding:

I’m building a chatbot for a note-taking app that uses vector embeddings to find relevant user notes for an AI chat.

Right now, I vectorize the whole note (no matter how big) and store that result in Pinecone.

For my search input, I then vectorize the last 8 messages (as one chunk) and use this to query my DB.

This approach is simple and seems to work. Do I need to be more granular (e.g. index chunks of text rather than whole notes, or single messages from the chat)?

Some code examples:

Embedding and storing user notes when they are created:

const newNote = await prisma.note.create({
  data: {
    title,
    content,
  },
});

const embedding = await getEmbeddingsForNote(newNote);

await notesIndex.upsert([
  {
    id: newNote.id,
    values: embedding,
  },
]);

Embedding the chat history:

const chatMessages: Message[] = json.messages;

const messagesTruncated = chatMessages.slice(-8);

const queryEmbedding = await getEmbedding(
  messagesTruncated.map((message) => message.content).join("\n"),
);

const vectorQueryResponse = await notesIndex.query({
  vector: queryEmbedding,
  topK: 5,
});

const relevantNotes = await prisma.note.findMany({
  where: {
    id: {
      in: vectorQueryResponse.matches.map((result) => result.id),
    },
  },
});

_j · October 13, 2023, 3:04pm

One of the concerns to consider about chunking is that you cause more representation of one document in the retrieval database than another.

Users may have the impression that they get back a lot more Harry Potter results than Edgar Allen Poe results due to the chunking.

Chunking methods at some point should be able to accommodate and manage the “note” if they get bigger in any case, so that they remain under the size where you could pass a few to the AI.

Conversational context is important, because you can’t just search on “what if that’s not true?”. One of the ideas (of many different) to handle conversation (and to use more AI calls) is to have in chat history metadata an AI-rewritten version of each user question with all context required for it to stand alone. Then you have a smaller past context to embed for the most recent question, an easy truncation point for older chat, and also you can have a smaller on-demand call to ask “is this a brand new subject?”, so you don’t embed prior baseball chat with the current dating advice question.

flowalther · October 13, 2023, 3:35pm

Thank you for your response.

What do you think about my “simple” approach of embedding a complete note as one block of text and using the last X message of the chat history as the search vector?

I understand that this is not perfect, but will it perform okay?

_j · October 13, 2023, 3:40pm

It’s not perfect, but it will perform okay.

Since it is a language model doing the embedding, the semantics should have more focus on the state that reading the whole thing put AI in than simply a keyword search.

zakkaryvele81 · June 6, 2024, 6:33am

Vector embedding for notes and chat history sounds fascinating! While I’m not a tech whiz, I can share a cool app that might help similarly organize your stuff.

Topic		Replies	Views
Embedding - text length vs accuracy? API	13	14811	December 25, 2023
Best way to save html files in vector store API langchain	4	6278	October 9, 2023
Questions about the embedding-based chatbot API embedding	4	91	December 15, 2024
Prompting with the chat/completions API against a large transcript file API	5	3502	October 4, 2023
Can someone make embeddings make sense? (Not what you think, more in post, lets discuss!) API embeddings , gpt-4	6	2162	September 19, 2023

Vector embedding notes and chat history

Related topics