I am using as a reference the code in openai-cookbook/Question_answering_using_embeddings.ipynb at main · openai/openai-cookbook · GitHub
My use case is simple as well: I want to read #n documents and run Q&A. I know that I have a max length of text that I can use in gpt-3.5-turbo
4,096 tokens (~5 pages)
I have few questions:
-
I see in output[7] that the text was split to different rows. Was it done due to accuracy? Meaning, the more I split the text to sentences, the more the answers will be accurate? Or I can use text up to 4,096 tokens to build the embedding
-
How do I work with the case I have more data than the max length? Any ideas?
-
I have noticed that the questions I ask cannot be a “smarts” for example, If I have a text of a painter that painted a paint of blue flowers and I asked: “Suggest a name for the paint” I got no answer. What can I do to improve it?
Thank you!.