Optimizing Context Injection for Scalable Script

Currently, I use ChatGPT by sending three documents that detail a pre-sales approach script, which serve to educate the AI about my business model. Following that, I submit a transcript of a user call and ask the AI to assign a rating based on the provided parameters. However, sending these three documents with every prompt becomes prohibitively expensive when scaling up. How can I “teach” the API to internalize this context efficiently?

You can’t “teach” the API your pre-sales approach, however you can design a system which would work as follows:

  1. Generate embeddings of the pre-sales approach script and store the vectors along with the text chunks in a vector database (this is done only once, not every time the user interacts with your assistant)
  2. Make a first API call instructing the GPT they are an assistant whose purpose is to summarize in 1. detailed paragraphs and 2. short explanations the different parts of a transcript and use a function call which accepts two arrays of strings as a parameter
  3. For every short summary in the array, create an embedding and query for similar semantic results in your vector database (where your pdf embeddings are stored)
  4. Make another API call for each summary and pass the matching text chunks as context along with the detailed summary as the user message, and make a good system prompt which instructs the GPT to give a rating of the user message according to the pre-sales approach script guidelines (that is the context you passed). Use structured outputs to get the rating
  5. Combine all the ratings to get the final rating and make an additional API call which instructs the GPT to respond to the user

Thinking about it, you could combine 5 on a single API call instead of making a call per each summary and the result would most likely be the same.

TLDR: Store the pdf as embeddings, ask the GPT to split the transcript into pieces, retrieve the relevant parts of the script according to each part of the transcript, pass it as context and generate the rating