In the moment I am brooding over a simple script/problem.
Given are several hundred documents (txt), between 10 and 30 pages
To every document a set of standardized questions will be posed, the answers stored in a cvs file
Since the documents are not connected and the context window of gpt-4-0125 is 128k tokens, my thought was to upload the txt file directly as a prompt and not go the vector-database-route.
Or is there a limit on how large the prompt can be?
Prompt 1: “I provide a text. Please keep it in mind, for I will pose a question. Please answer as short as possible./n/n{text}/n Please confirm with okay. After this, I will pose the question.” Answer 1: something like “Okay” or any other thing, doesn’t matter Question 1: “Please answer the following question: {text}” Answer 2: {answer}
What I want now is
to store “answer 2” in a cvs-file
delete answer 2 and question 1 from the chat history
pose a new question
repeat until all questions are posed
My goal is to not upload the text for every single question, but also I want the answers to the text to not interfere with each other. Every q-a-cycle shall be “fresh”.
I have bad news, but maybe even better news for you.
It’s possible that you might have a fundamental misunderstanding of how this technology works.
this is completely unnecessary. the model doesn’t “ingest” or mull about your text. For every token (“word”) generated, the model basically “reads” the whole text again.
So here’s the good news:
What you want to do is very possible. You don’t need to make the model forget, because it doesn’t need to - it won’t know what you don’t show it (apart from its trained knowledge, which was cut off somewhere in april '23).
so you can do stuff like this:
SYSTEM
You are QA bot. your job is to answer questions given a specific text. bla bla bla.
USER
here’s a text:
------------
{text}
------------
given the text above please answer the following question:
{question}
and that will spit out the response. let’s call it “response”
we can then go in and make another call:
SYSTEM
You are CSV Bot. Your job is to turn prose into structured CSV data.
your response must follow this schema:
{schema}
USER
here’s some free text:
------------
{answer}
------------
given the text above please turn it into a CSV.
Ensure that your response is a pure CSV. Do not include chatter. If your response is not a pure CSV, the system will crash. Start your response with the headers.
so tldr: no need to make it forget stuff if you can just roll back time