The best way to refer to history

There are several books with 1,000,000 tokens, each with its own GUID.

Would it be possible to put all books (and their GUIDs) into OpenAI once and refer to each one (using GUID) when asking questions (without having to re-send book content)?

What, exactly, do you mean by,

Would it be possible to put all books (and their GUIDs) into OpenAI

The only imaginable way I could see this being done is via embeddings, but embeddings are limited to about 8,000 tokens each (and that would be a very large embedding. Also, you wouldn’t be putting the embeddings into OpenAI you’d use some sort of vector DB solution.

In any event you’d be limited to 8k or 32k tokens for context anyway…

So, I guess I’m just not really sure what you’re imagining doing.

Apparently, you misunderstood!
With my app, users can choose a book (from several books) and ask their questions.

  • It’s not a good solution to resend the whole book for 10 mega-tokens for each question.
  • The summary of each book cannot be sent with every question since the information is no longer accurate.
  • The books need to be sent to OpenAI and their Completion IDs need to be stored in the database and referenced with each question.

Yeah…

I think I’m still not getting it.

When a user submits a prompt asking about a specific book, you will always need to take their question and add to it some context tokens from the text of the book, yes?

One thing I keep getting hung up on is when you say,

The books need to be sent to OpenAI and their Completion IDs need to be stored in the database and referenced with each question.

How do you imagine sending a book to OpenAI and what are you expecting OpenAI to do with a book you send them?

To the best of my knowledge OpenAI doesn’t have some magical data store for you to deposit text into for the model to reference.

If you want the model to have knowledge which isn’t already present in its training set, you need to provide that data every time you interact with the model and want it to reference that data in its response.

The standard way to add information to a model is through embeddings. But, you still need to build the prompt with context which is going to eat up tokens. There’s no free lunch.

So, I guess that’s it, I just want to know what you mean by sending a book to OpenAI.