I experimented with the Assistant in Playground by enabling the retrieval feature and uploading an ebook as a text file, which was around 120K tokens in size. I instruct the assistant to impersonate a character from the book and to reference the text file for its responses.
From monitoring the usage dashboard, I observed that for every query I posed, the context input was about 30K tokens.
Here’s what I think happens:
- Once the ebook is uploaded, OpenAI splits it into approximately four chunks, each close to 30K tokens.
- Then, embeddings are created for these segments and stored in a vector database.
- For each question I ask, the retrieval function searches the vector DB, selects the chunk most relevant to my query, and incorporates the actual text of that chunk into the prompt as additional context.
Is my interpretation accurate? Also, is there a way to tweak some settings so that the text is divided into smaller segments? I’m looking for a way to reduce the token count added to the prompt for each question.
Thank you for your insights.