Get answers of a question from a book and output the relevant sentences

Hello, i’m trying to use Chatgpt to get answers from a book, the book is sliced into sentences each sentence having a unique id, and stored as csv. it’s about 15mb so attaching it to the prompt is not an option, as i’m planning to run it hundreds of times a day.

for example here’s a sentence from the book

He hung his hooded cloak on the nearest peg, and “Dwalin at your service!” he said with a low bow

when i ask this question"what should i do after hanging my clothes at the door?" chatgpt should give me the id of this sentence because Dwalin hangs his cloth at the peg and says something, which is relevant to my question.and other similar sentences that may seem relevant to my question inside the book.

the question itself is not optimized to target that specific sentence in the book, it’s just a random daily question. but i want to find sentences that seems relevant to the question.

so far i have tried multiple things, such as embedding the book then extract relevant answers using cosine_similarity, but i don’t get any good results. then i tried long-text instead of short-text, the results became somewhat acceptable but still i couldn’t get predictable results like trying to get a sentence from the book even by asking targeted questions.

then to my last resort, i embedded the sentences again but this time i used chatgpt to extract the key points of each sentence, then embed those keywords, such as
input: ““Excuse me!” said the hobbit, and off he went to the door.”
output: “pardon hobbit say, open door”
but that gave me even worse results.

is there any other way i can achieve this without attaching the whole file to each prompt?

Welcome @hedihadi45

You can use the file search tool on Assistants API and give the assistant the whole book as a .txt plaintext file.

Instruct it to answer questions based only one the provided text document.

2 Likes

thanks for the response @sps , i forgot to mention i also gave assistant a try, but it cannot return structured data (json) so i was very hesitant to use it, but i guess there’s no better choice for now

That’s an exciting idea. Thanks for sharing. It’s important to consider both accuracy and context when searching for the correct sentences in books. GPT-4 can handle such tasks well. I would like to see an example or a piece of code to see how it works in practice. You can also try using libraries like LangChain or other NLP tools. I think that authors of platforms like https://essay-company.org/ are actively using this because they have to write a lot of high-quality text. It’s better to use indexing to work with large books or to divide them into parts for easier processing.