How do I "upload" a book to GPT3?

Hey guys,

I’d like to upload a book and ask questions about that specific book.

Would I have to put all the contents of the book into one/multiple JSONL files (if yes, is there a general structure or standards to follow)?

Alternatively, is there a way to tell if GPT# already has that book somewhere and to answer questions only from that book?

Thanks!

6 Likes

Hi there,

While it’s not possible to upload a book as-is, you could upload it in the form of “documents” of up to 2,048 tokens each (roughly 1,500 words), in the form of a JSONL file, where each line in the file is a document, which needs to be valid, UTF-8 encoded JSON.

This is likely not possible, except perhaps for very popular quotes from very popular books (e.g. verses from the bible or scenes from Lord of the Rings).

I hope that helps, and please let me know if you have any other questions!

Best,
Joey

3 Likes

Thank you for your reply Joey! Very enlightening response!

A couple more clarifications:

I guess that for every answer request to the API, the tokens in the file are searched and counted towards usage right? So if I upload a file with 1000 documents with 204 800 tokens in total, I’d be charged significantly for every request?

If yes, is there a way to “cache” it? haha

That’s correct.

You can find our pricing for the Answers endpoint here (near the bottom of the page).

Answers requests are billed based on the tokens in the input and answers.

The cost per token is based on which models are used for search and completion (controlled by search_model and model parameters). If you upload a document, costs are largely based on the number of documents reranked (controlled by max_rerank ) and the total length of those documents.

The length of examples , examples_context , question and the length of the generated answer (controlled by max_tokens / stop ) will also impact costs.

Ada is a common search_model, which costs $0.0008/1000 tokens, meaning that it’d cost around 16 cents to search through your example of 204,800 tokens with Ada, plus the cost of the completion and the other parameters.

1 Like

I’ve built this:

  1. Go through the entire book, building a list of snippets.
  2. Upload the list of snippets to GPT-3.
  3. Take a user question.
  4. Run Semantic Search to get the top 3 matches.
  5. Run Completion to generate an answer based on the top matches.

I abandoned the project:

The problem is that you can never know when an answer is actually based on content from the book or when it is a pure confabulation of the model’s internal parameters.

As you can see in my outline above, I’m not even using the “answers” endpoint. That’s because the “answers” endpoint doesn’t allow you to engineer the prompt used for completion. By using the generation endpoint, I can tweak the prompt. That means I can, for instance, tell the model explicitly to not make up stuff. Or I can throw in some sample completions before.

So I’ve done all that. And I’ve also run tests with the different settings (temperature, etc.). But I just couldn’t solve the problem.

Basically, the only way to eliminate the problem of confabulation is to reduce the model’s flexibility to the point where it only outputs the snippet found by search verbatim. In that case, you don’t need a generation step at all. And you’ve basically built a semantic search engine. And since you must build it with the smallest model, ada or babbage, for cost reasons, there’s really no reason to use GPT-3 at all.

And as soon as you make the model flexible enough that it can transform the top matching snippet into an actual answer to the question that was asked, you introduce the risk of confabulation.

The thing is:

We want to use AI to reduce bullshit on the web – not to add to it. And if GPT-3 gives you three answers from a book that are spot on, and then simply makes up the next one with no basis at all, you’ve got a very dangerous system. You just can’t trust it.

I ran it on the book “Mouth Care Comes Clean” by Dr. Ellie Phillips. I used that one because I had read it, and so was able to judge whether or not a response was likely to be a good answer or not.

Sometimes, the answers were really enlightening. And it was super-cool to be able to just ask questions to the book!

But then, for instance, I’d “ask the book” what toothbrush the author recommends. And the model spit out some brand name. I don’t remember what it was - “Oral B Sensitive Plus”, or something. And I went back to the book and searched for these terms. And there was literally nothing in the book that came even close. So the model had simply made that answer up.

I was really hopeful when I saw that OpenAI had introduced an “answers” endpoint. I thought they had solved that problem. But, last time I checked, the “answers” endpoint suffers from exactly the same problem. Maybe even worse, because you don’t have control over the prompt design.

There’s a reason that most successful GPT-3 apps to date are in the realm of marketing:

You just can’t trust that the output is based in truth.

If anybody finds a solution to this problem, please share it. That would really take GPT-3 to the next level, and make it useful for apps that rely on truth.

P.S. Here’s a screenshot of my app. As you can see, I have disabled the “Completion” step alltogether and have the model simply output the top 3 matches from semantic search.

Since it’s not the kind of “Ask the Book” app I envisioned, I’ve never published it.

12 Likes

Hi Leo. I am also building a “stay-off-the-internet” app. In other words, like you, I want answers that stay strictly within the bounds of my own inputs. I am not surprised that the completions endpoint doesn’t work, because that endpoint was designed only to “complete this text based on what is known from reading the whole internet” not to “answer this question based on this specific body of text.” I do think semantic search is the first step towards answers. I haven’t looked closely at the documentation yet, but I thought that after the search step is done, the answers endpoint could be applied specifically against the max-reranked documents. Have you looked into this any further since you posted? Thanks.

I’m also working on this problem. Still experimenting with different approaches.

One idea that I have is to make confabulation a feature rather than a bug by positioning completions as “alt editions” or “spirit of the book”.

What a insightful post thanks

1 Like

I’m not 100% sure I understand what you mean. I believe you mean:

  1. run semantic search on your own corpus
  2. have GPT answer the question based on the top results of semantic search (top n, or top 1)

If that’s what you mean:

This is what I did.

My completion prompt will basically look like this:

Which toothbrush does the author of the book recommend?

Here is what the author has said:
###
QUOTE: I recommend a toothbrush with hard bristles.
QUOTE: A toothbrush needs to dry for 48 hours before you can use it again. Hence you need at least two.
QUOTE: Never use those travel toothbrushes with a lid. Bacteria will multiply in the moist conditions.
###
Answer the following question based only on the information above:

QUESTION: Which toothbrush does the author recommend?
ANSWER:

I’ve made this example up from memory.

But the point is that GPT-3 will still every now and then completly make up an answer.

The only way to avoid this is to set the temperature to zero. But then you might as well just have your app spit out the first result from semantic search. Because with a temperature of zero, the model will not modify the snippet at all. So it will not take the information from the snippet and convert it into an actual answer.

If you want the model to convert the best matching snippet into an answer, you must increase the temperature. But as soon as you do that, you introduce the risk of confabulation again.

Hence, I regard this as an unsolvable problem for now and have stopped working on it.

I have read that some people have some success forcing GPT to “think a bit more” by not only asking for an answer, but also asking it to justify its answer. You may not actually want to use the justification it provides. But merely forcing the model to explain its reasoning seems to make it a bit more reliable.

How that would work for our particular problem, I don’t know.

There’s also research going on at Facebook and Carnegie Mellon, designed to spot and eliminate “hallucinated” answers. (In the context of NLP models, I personally prefer the word “confabulation”.) AFAIK, it’s nothing we could implement ourselves, since you need access to the actual neural net, I believe.

Personally, I think GPT-3 is impressive. But it is only faking “reasoning”. It is like a child that has learned a trillion song lyrics. Of course it will astound you in its ability to complete sentences. But it actually hasn’t learned the meaning of any single word. It does not even know that a real world exists to which the words are pointing.* It is just singing along.


*That is not to say that I am convinced that true intelligence would necessitate having a presence in the physical world. It might very well be that human language is all an A.I. needs to see in order to understand the world through our eyes and ears. But that doesn’t mean you can just dump petabytes of unstructured data at a neural net and hope it’s going to sort it all out on its own. Because if you do that, you have no way of knowing what your model has actually learned. It might give you correct answers 99% of the time. But then there’s that 1% of cases where it gives an answer that doesn’t make any sense at all by human standards. This is because you are dealing with a phonetic intelligence – not a reasonable one.

1 Like

Now that fine-tuning is available I think this approach can be tackled by building a classifier, which predicts if the answer is based on the information in the context or not. Using such a classifier, it’s possible to generate a number of answers, and pick only the one which satisfies the probability that the answer is indeed based on the context provided.

See this guide for details:

12 Likes

Thanks for direction.
Curious whether this feature will part of GPT-3’s future update, as I am wondering many/most users will require accurate answer from GPT-3.

1 Like

Yes, that’s something we’re thinking about and would like to come up with a solution which works well and is easy to use.

2 Likes

As many have addressed, Uploading large data is inefficient and you don’t get good results because we do not have access to the neural net itself, what would not be easy to the majority of the target audience of OpenAI but for those that are familiar with it would be like having super powers (could be dangerous too). Most of the magic is explained on Kaggle and HuggingFace.

What I learned is to make the GPT talk to itself because it reinforces some choices.

I just wish we had access to the neural net itself. The solutions I come up with are analogous to using tape and WD-40.

This is how I feel sometimes. hahaha
funny-science-news-experiments-memes-problem-solving-engineer-styled(1)

1 Like

Great post.
Few things though:

  1. You can help the model answer “I don’t know” if you include that option in the prompt and/or in the fine-tuning file. It also helps if you specifically tell it that if it doesn’t know, it should answer “I don’t know”.
  2. Temperature 0 doesn’t mean it will copy the text. In many cases it will rephrase it, but as long as the information will be there, the chances are higher that the answer will be truthful. The problem starts where it doesn’t have the information so it makes it up.
  3. In your example, the answer isn’t there. Try add a brand to the data and it will pick it up.
1 Like

@boris This is very useful, thanks for sharing.
It’s worthwhile to add it to the documentations.
It will be great if there will be another metric, similar to temperature that will rank the model confidence on its own reply (if such data can be extracted).
We already know that it can actually tell when it doesn’t know, so I’m guessing that such information is there.