Use file with text-davinci-001 to increase tokens in prompt

lmccallum · January 29, 2022, 6:06pm

I am working with text-davinci-001 (formerly davinci-instruct-beta-v3) to generate answers to questions. The sources on which the answers should be based are in my prompt. Hence my prompts are long. Is it possible to upload a file containing the sources so that I can include > 2048 tokens? Thanks.

FlashAndromeda · January 30, 2022, 7:00pm

There is a function in the API for answering questions based on resources. Read up on it here.

lmccallum · January 31, 2022, 1:22am

That’s exactly why I was asking the question, because the answers endpoint does this:

“The endpoint first [searches] over provided documents or files to find relevant context. The relevant context is combined with the provided examples and question to create the prompt for [completion].”

I don’t need or want the above functionality in my workflow. I am using the embeddings endpoint to find the top n documents most similar to my user’s query. I want to dynamically populate a file with those top n documents (rather than include them inline) and use those n documents as the source of truth for generating answers using text-davinci-001.

sps · January 31, 2022, 4:10pm

That’s very interesting @lmccallum

What would be even fascinating would be if we could use embeddings directly in the answers api as search model.

lmccallum · February 2, 2022, 12:37am

Agree. This might be something OpenAI will offer in the future.

lmccallum · February 11, 2022, 11:28am

Perhaps I can try a different method around the 2048 tokens limit. Each of my top n search results is associated with a unique ID. If I upload a json lines file in advance containing the text versions of all of my embeddings, along with their unique IDs, then perhaps I could instruct GPT-3 to write the completion taking into account only the text associated with those IDs.

lmccallum · February 11, 2022, 9:26pm

Thanks! I think that method could require too much manual quality control, given the length, complexity and interactions of my passages. But I will give it some thought.

lmccallum · February 14, 2022, 1:06am

We’re trying this workflow to cope with the 2048 token limit:

Get the top n search results for the query from the embeddings endpoint.
Each search result consists of an arbitrary number of paragraphs of text.
Parse each search result into sentences.
Re-rank the sentences based on their similarity to the query.
Use only the top n sentences (up to a limit of 2048 tokens) in the prompt for text-davinci-001.
Also in the prompt, provide instructions to answer the user’s query based strictly on the sentences.

This is essentially a filter to obtain the most relevant information for answering the user’s query, before building the prompt, allowing us to shorten the prompt. We still achieve our goal of getting GPT-3 to use only the provided information to generate the answer.

I’ll let you know how well this works. Could be a useful workflow to share once ready.

sps · February 14, 2022, 8:49am

Wow! This is quite an interesting approach. My guess is this might take a good amount of time between user asking the question and the whole workflow returning the answer. But if that isn’t an issue, the quality of responses should hopefully be much better.

lmccallum · February 14, 2022, 8:17pm

We’ve got it working! Yes it’s a bit slow. Perhaps we need to spend some money on compute resources? Also, the answers are of varying quality, so I need to fiddle with instructions, temperature, top n results to use, etc. We definitely have to use Davinci. Tried with Babbage and it was hopeless.

sps · February 15, 2022, 4:02am

I think most of the delay is because of the multiple API calls chained together. Spending money on compute on your end isn’t going to be effective. However you can experiment on Azure or other cloud. I say Azure because OpenAI is itself hosted on Azure, so that should minimize network delay if correct region is used.

Definitely go for davinci. I read somewhere in the OpenAI docs suggesting to develop functionality first using davinci and then try to replicate the same on lesser compute intense models like curie, babbage, ada.

Using davinci will slow down the process further though.

Also, you can reduce the delay by reducing the max_tokens used in the completions.

jochemstoel · June 16, 2022, 12:59am

If you want to use GPT to answer questions about a text that is longer than 2048 tokens (or 4000 I think for instruct v2) then in my experience the best approach is to split your document up into smaller pieces, then use the embeddings API endpoint to query the embeddings for those pieces as well as the question you are asking, then perform a cosine similarity comparison on the embeddings and then finally use a prompt to formulate the answer. This is similar to how the (now deprecated) answers API of OpenAI works.

So.

Split your document up into pieces. (sentences, paragraphs or something)
How you want to split your document depends on your use case, the type of document. If you want to split it up into sentences for example, you can split the text based on a period, question mark or exclamation mark delimiter. Make sure your document is plain text and stripped of any unneeded stuff like markup.
Fetch the embeddings for each bit
Send a request to the OpenAI embeddings endpoint. This will return an array of numeric (float) values.
Fetch the embeddings for your question
This will be used to find the most relevant document (piece).
Calculate semantic similarity using a cosine similarity in whatever language you are using
A cosine similarity function is a relatively simple function that calculates the similarity of two sequences of numbers. In this case, your embeddings. Read more
Generate a prompt starting with the most relevant piece of text (the one with the highest similarity), followed by the question.
If you are using instruct then you can simply append the question to the document piece separated by two newlines. For example:

In quantum computing, a qubit or quantum bit is a basic unit of quantum information —the quantum version of the classic binary bit physically realized with a two-state device.

What is a qubit?

If you are using davinci base model then it is better to put the question as a Q/A pair. For example:

In quantum computing, a qubit or quantum bit is a basic unit of quantum information —the quantum version of the classic binary bit physically realized with a two-state device.
Q: What is a qubit?
A:

This makes the model assume a question/answer scenario. Whichever model you choose to use, the response will be your answer.

Note: make sure to cache your embeddings locally or on your server where-ever your application is running. This will save you a lot of money. There is no need to get the embeddings for something more than once.

I hope this helps somebody and if you have any questions, by all means ask.

jhsmith12345 · June 16, 2022, 3:20am

This speaks to the exact problem I have been working on for a couple of days. Your post is extremely helpful and explained the concepts better than the docs (imo). It also cleared up a few questions as the docs assumed a higher level of previous competence than I had.

Most notably, the fact that you can cache the embeddings, and that the cosign similarity is done locally. Thank you!

Topic		Replies	Views
Is it possible to fine-tune a model to answer questions given a raw text? Prompting	18	10152	December 15, 2023
Generating a report from a limited corpus? Prompting	13	2535	December 16, 2023
Answering lots of questions from one large chunk of text without paying tokens to input the big text chunk for each question? API api	16	10522	December 24, 2023
How do I summarise a block of text larger than the token limit? API	13	9037	December 17, 2023
Fine tuning completation API	9	2351	December 25, 2023

Use file with text-davinci-001 to increase tokens in prompt

Related topics