Based on specific Wikipedia page

I’d like to create a chatGPT that prioritizes the consideration of a certain amount of text data information I give it, e.g. wikipedia.
(I know I can’t use chatGPT with the API. If I use the API, text-davinc-003, davinci if I do fine-tuning, etc.)

What should I do to make this wish come true?

For example, if it is about the soccer player Messi, the question could be “What was the last title won by Messi? When and where was it held?” and the ideal answer would be "The World Cup in Qatar in 2023.

Incidentally, we have confirmed that the system works correctly when the relevant part of wikipedia and the previous question are stored in a single prompt.
However, I am considering what to do when this amount of data becomes too large.

I thought about dividing the knowledge into multiple parts like ChatGPT, but from my research, it seems that OpenAI’s API does not store the previous prompt.

I also thought about fine-tuning, but am having trouble figuring out how to prepare the dataset. I don’t want to think about preparing them one by one, for example “the year messi won the world cup”: “2023”.

wikipedia is an example and I will prepare a certain amount of text for me to use.

If you know of any good ideas or examples, please let me know.

hey, did you ever figure this out? Thanks.

There’s many ways to solve this problem, embedding and fine-tuning are good for situations when you want GPT to respond in specific ways, but depending on what you’re most comfortable with I’ll recommend trying the zero shot ReAct agent implementation from langchain:

If you’re still having issues, or building something at scale I’ll recommend having a look at this example from the OpenAI cookbook: