Generating output based on references

If I wanted to upload lets say 10 papers on a subject . then get GPT to base the content it writes from only these sources is this possible if so how would it be done ?

Welcome to the community!

Look into fine-tuning.

Good luck!

For question answering on a knowledge base, you can break up your papers into semantically relevant chunks (most obvious is paragraphs) and get the vector embedding for each chunk. Get the embedding for the query on the fly, and use cosine similarity (or some other measure of similarity) to find the top n chunks in terms of semantic similarity to the query. Retrieve the original text associated with those top n chunks. Send that text plus the user query plus your instructions (e.g. “please answer the question based on the paragraphs listed below”) to text-davinci-003 for completion. The prompt + completion cannot exceed 4096 tokens. (Some models are smaller; I use text-davinci-003 to maximize the number of tokens that can fit in my prompt.) The variable n represents the number of chunks of text that can fit into your prompt, leaving room for the user query, your instructions, and the “max tokens” you specify for the completion. The longer each chunk is, the lower n will be. In my experience, figuring out the optimal way to break up the text into chunks is the most challenging task and has the biggest impact on how good the answers are. I hope this is helpful.

3 Likes