Vector embedding notes and chat history

I have a question for my general understanding:

I’m building a chatbot for a note-taking app that uses vector embeddings to find relevant user notes for an AI chat.

Right now, I vectorize the whole note (no matter how big) and store that result in Pinecone.

For my search input, I then vectorize the last 8 messages (as one chunk) and use this to query my DB.

This approach is simple and seems to work. Do I need to be more granular (e.g. index chunks of text rather than whole notes, or single messages from the chat)?

Some code examples:

Embedding and storing user notes when they are created:

const newNote = await prisma.note.create({
  data: {

const embedding = await getEmbeddingsForNote(newNote);

await notesIndex.upsert([
    values: embedding,

Embedding the chat history:

const chatMessages: Message[] = json.messages;

const messagesTruncated = chatMessages.slice(-8);

const queryEmbedding = await getEmbedding( => message.content).join("\n"),

const vectorQueryResponse = await notesIndex.query({
  vector: queryEmbedding,
  topK: 5,

const relevantNotes = await prisma.note.findMany({
  where: {
    id: {
      in: =>,

One of the concerns to consider about chunking is that you cause more representation of one document in the retrieval database than another.

Users may have the impression that they get back a lot more Harry Potter results than Edgar Allen Poe results due to the chunking.

Chunking methods at some point should be able to accommodate and manage the “note” if they get bigger in any case, so that they remain under the size where you could pass a few to the AI.

Conversational context is important, because you can’t just search on “what if that’s not true?”. One of the ideas (of many different) to handle conversation (and to use more AI calls) is to have in chat history metadata an AI-rewritten version of each user question with all context required for it to stand alone. Then you have a smaller past context to embed for the most recent question, an easy truncation point for older chat, and also you can have a smaller on-demand call to ask “is this a brand new subject?”, so you don’t embed prior baseball chat with the current dating advice question.

Thank you for your response.

What do you think about my “simple” approach of embedding a complete note as one block of text and using the last X message of the chat history as the search vector?

I understand that this is not perfect, but will it perform okay?

It’s not perfect, but it will perform okay.

Since it is a language model doing the embedding, the semantics should have more focus on the state that reading the whole thing put AI in than simply a keyword search.

1 Like